首页|光纤扰动信号识别的可解释性特征选择方法

光纤扰动信号识别的可解释性特征选择方法

扫码查看
特征选择对基于机器学习方法的相敏光时域反射仪(φ-OTDR)系统的扰动信号的识别具有重要意义。提出一种高效的、具有可解释性的特征选择方法。该方法利用机器学习可解释性方法[沙普利加和解释(SHAP)]量化特征对模型的贡献,并按特征重要性进行排名,选择若干重要的特征构建特征子集。利用北京交通大学的开源数据集,提取6种扰动事件的22种特征,并构建4种常用的分类模型进行信号识别。在保证识别准确率的前提下,根据模型的特征重要性排名结果的差异性,选择不同数量的特征重新训练模型后,识别时间均有不同程度的减少,识别性能获得不同程度的提升,其中随机森林的性能最优,平均识别准确率提高到96。5%,且单个样本的平均识别时间减少19。3%。在选择同样数量的特征条件下,所提方法的平均识别准确率均高于其他方法。实验结果证实了所提可解释性特征选择方法的优越性和可靠性。
Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition
Objective The distributed optical fiber sensing system based on a phase-sensitive optical time-domain reflectometer (φ-OTDR) has been widely used for disturbance signal recognition in perimeter security,pipeline monitoring,railway transportation monitoring,and other fields,due to its advantages of high sensitivity,multi-point monitoring,and wide coverage. Currently,machine learning-based methods are the primary approach to enhance the accuracy of disturbance signal recognition. Classical machine learning algorithms require preprocessing of raw input signals through manual feature extraction. Typically,increasing the number of extracted features is aimed at achieving higher recognition accuracy with the growth in the number of disturbance events. However,introducing irrelevant features can adversely affect recognition accuracy and efficiency. Therefore,the feature selection process,which eliminates irrelevant features to strengthen recognition performance,plays a crucial role in the preprocessing stage. Feature selection methods can be categorized into three types:filter,wrapper,and embedded methods. Particularly,most feature selection methods used for optical fiber disturbance signal recognition fall under the filter method category,often overlooking the relationship between features and models. In this study,we aim to develop a more efficient and interpretable feature selection method for identifying key features to further boost recognition performance.Methods We propose a novel feature selection method based on Shapley additive explanations (SHAP),which is an explainable artificial intelligence (XAI) method. SHAP is inspired by game theory to calculate the Shapley value,which can quantify the contribution of each feature to the model's prediction (Equation 1). We use SHAP to obtain the mean SHAP value for a classification model. The higher the mean,the more important the feature. We rank the features byimportance and select some of the most significant ones to form a feature subset while ensuring high recognition rates. This subset is used to retrain the model,thereby improving recognition efficiency.Results and Discussions Experimental validation is conducted using an open dataset of optical-fiber disturbance events from Beijing Jiaotong University,divided into training and test sets at an 8:2 ratio (Table 1). The dataset includes six typical disturbance events:background noise,digging,knocking,watering,shaking,and walking. We extract sixteen time-domain features from the disturbance signals after differentiation and segmentation. Additionally,wavelet packet decomposition (WPD) is employed to extract six frequency-domain features (Tables 2 and 3). The feature set,comprising twenty-two features,is normalized and inputted into four common machine learning models as baselines:support vector machine (SVM),K-nearest neighbor (KNN),decision tree (DT),and random forest (RF). KernelSHAP is applied to SVM and KNN,while tree SHAP is used for DT and RF. The ranking of these twenty-two features is determined across the four models (Fig. 6). Importantly,each feature contributes differently to the classification of the six disturbance events depending on the model. To maintain recognition accuracy without compromise,we retain a varying number of key features for each model. Comparing the accuracy,precision,recall,and F1-score from the test confusion matrices (Tables 4‒5),we observe improvements in recognition performance across varying degrees due to feature selection. Among the four models,the RF model achieves the highest recognition accuracy of 96.5%. Furthermore,the average recognition time per sample for the RF model decreases from 81.82 ms without feature selection to 66.01 ms,which marks a 19.3% reduction (Table 6). Common feature selection methods such as fisher score and mutual information are also used for comparison with the SHAP-based feature selection method. The SHAP-based method demonstrates superior recognition accuracy compared to these alternatives (Table 7).Conclusions We propose a feature selection method characterized by interpretability and reliability. This method leverages explainable AI (XAI) techniques to quantify the importance of different features for the model and selects them based on their importance rankings. By retaining the most effective features for model classification and discarding redundant or detrimental ones,our approach enhances recognition accuracy while reducing computational costs and identification time. Twenty-two features are extracted from six types of disturbance events using an open dataset from Beijing Jiaotong University. We employ four common machine learning models for signal recognition. By carefully considering variations in feature importance rankings across models,we construct different subsets of features. This results in significant decreases in single-sample testing times for all four models and varying degrees of improvement in average recognition accuracy. Compared with filtering methods based on statistical metrics,our proposed method selects more valuable features,thereby achieving higher recognition rates. It is important to note that these conclusions are drawn solely from the dataset used. Further validation is necessary to assess its applicability to more complex or real-world datasets. Future work could involve comparing feature importance rankings across more models and integrating other feature selection methods to develop a versatile approach for optical-fiber disturbance signal recognition.

sensorphase-sensitive optical time-domain refractometersignal recognitionfeature selectionexplainable machine learning

孙敏、方捻

展开 >

上海大学通信与信息工程学院,特种光纤与光接入网重点实验室,特种光纤与先进通信国际合作联合实验室,上海 200444

传感器 相敏光时域反射仪 信号识别 特征选择 机器学习可解释性

2024

光学学报
中国光学学会 中国科学院上海光学精密机械研究所

光学学报

CSTPCD北大核心
影响因子:1.931
ISSN:0253-2239
年,卷(期):2024.44(21)