An automatical IOC extraction method for reducing dependency on threat intelligence labeling
To address the increasingly challenging cyber threats,there is an urgent need to analyze cyber threats to gain advantage in cyberspace operations.Indicator of Compromise(IOC),an essential part of Cy-ber Threat Intelligence(CTI),is throughout the entire cyber attack lifecycle and describes key information(attack behaviors,entities,etc.)accurately at each attack stage.Extracting IOCs from CTI can assist cyber defence,trace and countermeasure.Existing IOC extraction methods have made great progress with machine learning or deep learning,but they require massive investment to label adequate CTI for training and are not as effective in scenarios with limited labeled CTI.To tackle this challenge,Automatical IOC Extraction based on Less labeled data(L-AIE),a novel IOC extraction method,is proposed to reduce the labeling cost while ensuring the extraction accuracy.L-AIE enhances the CTI text processing by fine-grained word tokenization to obtain enough information from less CTI.Context and Combination Layer are used to extract sufficient con-text of IOC entities which are split into subwords.Furthermore,in the training stage,L-AIE has an addi-tional Relation Layer to expand the differences between IOC categories.Extensive experiments demonstrates that L-AIE not only has less dependence on the amount of labeled data but also outperforms other outstanding methods.With only approximately 10%of the training data of previous experiments,L-AIE achieves a macro F1 score of 87.54%,more than 20%higher than other methods.When the amount of training data is further reduced,the L-AIE extraction result is affected to less than half the extent of the other models.
Cyber threatCyber threat intelligenceIndicator of compromiseFew-shot learning