[Objective]This paper proposes a model combining head and tail pointers with active learning,which addresses the sparse sample issues and helps us identify long terms on weapons.[Methods]Firstly,we used the BERT pre-trained language model to obtain the word vector representation.Then,we extracted the long terms by the head-tail pointer network.Third,we developed a new active learning sampling strategy to select high-quality unlabeled samples.Finally,we iteratively trained the model to reduce its dependence on the data scale.[Results]The Fl value for long term extraction was improved by 0.50%.With the help of active learning post-sampling,we used about 50%high-quality data to achieve the same Fl value with 100%high-quality training data.[Limitations]Due to the limitation of computing power,the data set in this paper was small,and the active learning sampling strategy requires more processing time.[Conclusions]Using head-tail pointer and active learning method can extract long terms effectively and reduce the cost of data annotation.
Term ExtractionActive LearningHead-to-Tail Pointer NetworkBERTWeaponry