面向周期性工业时序数据的流式清洗系统
Streaming Cleaning System for Periodic Industrial Time Series Data
王耀 1赵炯 1周奇才 1熊肖磊 2陈传林 3张恒3
作者信息
- 1. 同济大学 机械与能源工程学院,上海 201804
- 2. 同济大学 浙江学院,浙江 嘉兴 314051
- 3. 上海地铁盾构设备工程有限公司,上海 200233
- 折叠
摘要
为了高效清洗具有时序性、周期性等特点的工业数据,首先利用分布式组件设计了一套流式清洗系统,系统以Mosquitto作为采集数据的汇集中心,以Flume为连接组件,以Kafka为缓冲组件,对接数据清洗组件,使系统具有高吞吐、大缓冲等优势.然后基于速度约束模型,设计了一种周期性数据清洗算法,综合工业数据的时序性、周期性、物理意义等特性,在原有速度约束算法基础上增加周期性检测和数据切片机制,以解决速度约束算法处理周期性数据的失真问题,提高可用度.最后文中以盾构掘进数据集为样本,验证了系统和算法的有效性,以及改进算法的适用性.
Abstract
To efficiently clean industrial time series with the characteristics of periodicity,a streaming data cleaning system was first designed using distributed components.The system employs Mosquitto for data gathering,Flume for connection,and Kafka for the buffer,which provides benefits of high throughput and a large buffer.The data cleaning component serves as the core of the system.Then,a periodic time series cleaning algorithm was proposed based on a constraint model.Integrating the characteristics of temporality,periodicity,and physical meaning,the methods of periodic detection and data slicing were added to the original speed constraint algorithm,so as to solve the distortion problem of the original algorithm and improve the availability to deal with periodic data.Finally,the effectiveness of the system and the improved algorithm was verified using a tunnel boring machine data set as a case study.
关键词
数据清洗/工业大数据/时序数据/速度约束/周期性Key words
data cleaning/industrial big data/time series data/speed constraint/periodic引用本文复制引用
基金项目
上海申通地铁集团有限公司科研项目(JS-KY21R003-3)
出版年
2024