A Feature String Mining Algorithm in Industrial Control Protocols Recognition
The identification of industrial control protocols is the first step in research on industrial control protocols.In the communication process,frequently occurring strings are important features for identifying industrial control protocols.We propose a top-down frequent string mining algorithm that can directly obtain a non-redundant set of frequent strings for feature extraction in industrial control protocols identification.Additionally,to address the issue of large original data and numerous algorithm iterations in the top-down method,we borrow from the N-gram model and propose a data partitioning strategy to solve the problem of processing large data in the top-down approach.Furthermore,to mine frequent strings,we adopt a combination of deleting overlapping items and string splitting.Experimental results show that the proposed algorithm can identify feature strings in multiple protocols and achieve good results in protocol identification by using identified strings as features.It can be concluded that the proposed algorithm can effectively extract feature strings from industrial control protocols.