An Automatic Discovery Method for Heuristic Log Templates
Log is an important source of data in the field of security analytics.However,unstructured raw log can't be used directly for security analysis,so parsing log into structured templates is a critical first step.Most of the existing log parsing methods assume that the log messages belonging to the same log template have the same log length,but the log messages belonging to the same template are incorrectly extracted into different templates due to the variable length of the log.Therefore,this paper proposed an automatic log template discovery method,KeyParse,which firstly calculated the similarity between logs and templates based on the longest common subsequence algorithm,so as to ignore the differential influence caused by variables,so as to achieve the matching of logs and templates.Secondly,the log template grouping was realized based on the highest frequency items to avoid the log messages belonging to the same event and different lengths being divided into different template groups,which reduced the template redundancy and improved the template matching efficiency.Finally,the HeavyGuardian algorithm was used to realize the statistics of the highest frequency items of streaming log messages.It solved the problem that the traditional frequency statistics method was difficult to adapt to the dynamic change of the word frequency of streaming log messages.Experimental results show that KeyParse has higher accuracy in the face of various types of log sets,with an average parsing accuracy of 0.968,and has higher performance when parsing large log sets.