首页|PypiGuard: A novel meta-learning approach for enhanced malicious package detection in PyPI through static-dynamic feature fusion
PypiGuard: A novel meta-learning approach for enhanced malicious package detection in PyPI through static-dynamic feature fusion
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
Elsevier
The increasing reliance on open-source software repositories, especially the Python Package Index (PyPi), has introduced serious security vulnerabilities as malicious actors embed malware into widely adopted packages, threatening the integrity of the software supply chain. Traditional detection methods, often based on static analysis, struggle to capture the complex and obfuscated behaviors characteristic of modern malware. Addressing these limitations, we present PypiGuard, an advanced hybrid ensemble meta-model for malicious package detection that integrates both static metadata and dynamic Application Programming Interface (API) call behaviors, enhancing detection accuracy and reducing error rates. Leveraging the MalwareBench dataset, our approach utilizes an innovative preprocessing pipeline that fuses metadata features with categorized API behaviors. The PypiGuard model employs a hybrid ensemble structure composed of Random Forest (RF), Gradient Boosting (GB), Decision Tree (DT), K-Nearest Neighbors (KNN), LightGBM, and an Artificial Neural Network (ANN), assembled through dynamically optimized stacking-based meta-learning framework that adapts to model-specific prediction strengths. Compared to Deep Learning (DL) baselines like Long-Short Term Memory (LSTM) and Convolutional Neural Network (CNN), PypiGuard achieves significant improvements in accuracy and False Positive Rate (FPR), with a detection accuracy of 98.43% and a markedly low FPR, confirming its enhanced effectiveness in accurately identifying malicious packages.