首页|一种工控协议识别中的特征字符串挖掘算法

一种工控协议识别中的特征字符串挖掘算法

扫码查看
对工控协议的识别,是对工控协议开展研究的第一步。而在通信过程中频繁出现的字符串,是对工控协议识别中的重要特征。针对工控协议识别中特征字符串的提取问题,提出了一种自顶向下的频繁字符串挖掘算法,可以直接得到没有冗余的频繁字符串集。同时,对于自顶向下方法中原始数据过于庞大、算法迭代次数较多等问题,借鉴了N-gram模型,提出了一种数据划分策略,解决了自顶向下处理时数据过大的问题。此外,在挖掘频繁字符串的过程中,采取了删除重叠项与字符串分裂相结合的方法。实验结果表明,该算法针对多种协议均能识别出其中的特征字符串;同时,利用识别出的字符串作为特征,在协议识别工作中也能取得良好的效果。可以得出结论,该算法能够较好地提取出工控协议中的特征字符串。
A Feature String Mining Algorithm in Industrial Control Protocols Recognition
The identification of industrial control protocols is the first step in research on industrial control protocols.In the communication process,frequently occurring strings are important features for identifying industrial control protocols.We propose a top-down frequent string mining algorithm that can directly obtain a non-redundant set of frequent strings for feature extraction in industrial control protocols identification.Additionally,to address the issue of large original data and numerous algorithm iterations in the top-down method,we borrow from the N-gram model and propose a data partitioning strategy to solve the problem of processing large data in the top-down approach.Furthermore,to mine frequent strings,we adopt a combination of deleting overlapping items and string splitting.Experimental results show that the proposed algorithm can identify feature strings in multiple protocols and achieve good results in protocol identification by using identified strings as features.It can be concluded that the proposed algorithm can effectively extract feature strings from industrial control protocols.

frequent stringstop-downdata segmentationfeatures extractiondata processing

海洋、徐魁、李晓辉、曾涛、陶军

展开 >

宝鸡市公安局通信处,陕西 宝鸡 721014

宝鸡创天清航科技发展有限责任公司,陕西 宝鸡 721000

东南大学 网络空间安全学院,江苏 南京 210096

频繁字符串 自顶向下 数据划分 特征提取 数据处理

中国高校产学研创新基金-阿里云高校数字化创新专项

2021ALA03006

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(1)
  • 15