为有效识别桥梁健康监测数据的异常,减少误预警、漏预警现象,保障桥梁监测数据的质量和有效性,针对大跨度斜拉桥长期监测数据的缺失、离群和漂移3类异常数据,提出基于时间序列压缩分割的监测数据异常识别算法.该算法将原始监测数据时间序列通过基于序列重要点(Series Importance Point,SIP)的时间序列线性分段(Piecewise Linear Represent,PLR)算法(PLR_SIP)得到数条时间子序列;然后采用欧氏距离进行时间子序列的相似性分析,并基于改进的局部离群因子(Local Outlier Factor,LOF)算法计算每条时间子序列的局部离群因子;最后将其与设定的阈值相比较,从而识别出监测数据的异常.为验证该算法的准确性与工程实用性,对某公路大跨度斜拉桥健康监测数据进行异常识别.结果表明:采用PLR_SIP算法对原始时间序列压缩分割得到的时间子序列能够准确地反映原序列的变化趋势和范围;改进的LOF算法突破了传统LOF算法仅能识别离群值这类无持续时间异常的局限性,能够排除噪声的干扰,实现对离群、缺失和漂移3种异常的识别.该算法无需定义训练集,直接以原始监测数据作为算法的输入,同时能够自适应调整阈值参数,具有良好的可扩展性、实时性、准确性和高效性,适用于处理实时、大量的桥梁健康监测数据.
Research on Monitoring Data Anomaly Recognition Algorithm Based on Time Series Compression and Segmentation
To effectively identify anomalies in bridge monitoring data and reduce the occurrence of false and missed alarms,and ensure the quality and effectiveness of bridge monitoring data as well,a monitoring data anomaly recognition algorithm based on time series compression and segmentation is proposed,counting on the anomalies(missing,outlier,and drifting)in the long-term monitoring data of long-span cable-stayed bridges.In this algorithm,the original monitoring data time series is segmented into multiple shorter time series using the piecewise linear regression with spectral information preservation algorithm based on important points(PLR_SIP).Then,the similarity analysis of time series is performed using Euclidean distance,and the local outlier factor(LOF)algorithm is used to calculate the local outlier factor of each time series.Finally,the anomalies in the monitoring data are identified by comparing them with the set thresholds.The proposed algorithm has been applied to detect anomalies in the health monitoring data of an existing long-span highway bridge for verification.It is shown that the time sub-series can accurately reflect the trend and range of original series by using PLR_SIP algorithm to compress and segment the original time series.The improved LOF algorithm breaks through the limitations of the traditional LOF algorithm which can only recognize outliers without duration.It can eliminate the interference of noise and realize the recognition of outliers,missing data and data drifting.The algorithm directly takes the original monitoring data as the input of the algorithm,with no need of defining the training sets,and can adaptively adjust the threshold parameters.With sound scalability,real-time performance,accuracy,and efficiency,the algorithm is capable of processing real-time and mass bridge health monitoring data.