目前物联网流量异常检测研究存在忽视特征筛选重要性的问题,筛选出一个无冗余的特征集有助于异常检测模型的训练与精简化。为了高效地提取物联网流量数据集中的无冗余特征集,文章提出了一种基于差分进化算法的两步走特征筛选算法。该算法首先使用基于线性相关系数和最大信息系数的双过滤器对数据集进行过滤式特征筛选,得到初筛结果特征集,再在此数据集基础上使用文章提出的一种包裹式特征筛选算法——DEWFS(Wrapped Feature Selection based on Differential Evolution),用极限学习机作为模型,经过预先定义的迭代次数,最终得到保留原始特征集异常检测性能的无冗余特征集。DEWFS算法基于差分进化算法,但对其初始化与中间迭代步骤进行了相应优化,使之能够适应流量特征筛选领域的优化任务。实验结果证明,该两步走算法能高效地筛选出物联网流量无冗余特征集,显著降低了后续流量异常检测算法的计算时间。
A Selection Algorithm for Redundancy-free Feature Sets in Internet of Things Traffic
Currently,research on anomaly detection in IoT traffic often overlooks the importance of feature selection.Selecting a redundancy-free feature set is crucial for efficiently training and simplifying anomaly detection models.To efficiently extract a redundancy-free feature set from IoT traffic datasets,we propose a two-step feature selection algorithm based on the differential evolution algorithm.Initially,a dual-filter approach utilizing linear correlation coefficients and maximum information coefficients is applied for filter-based feature selection to achieve preliminary screening results.Subsequently,it applies a novel wrapper-based feature selection al-gorithm—DEWFS(Differential Evolution Wrapped Feature Selection),with extreme learning machine as the model,through predefined iterations,ultimately obtaining a redundancy-free feature set that preserves the original feature set's anomaly detection capabilities.The DEWFS algorithm,while grounded in differential evolution,has been optimized in its initialization and intermediate iterative steps to adapt to the optimization tasks specific to traffic feature selection.Experimental results demonstrate that the proposed two-step algorithm efficiently selects a redundancy-free feature set for IoT traffic,significantly reducing the computational time for subsequent anomaly detection algorithms.