Thesaurus Development and Application in the Field of Intangible Cultural Heritage Ceramics Incorporated with Learning Extension
Based on extended machine learning and deep learning,this paper proposes a method for term extraction and new word discovery for the Intangible Cultural Heritage(ICH)project corpus,builds a domain thesaurus and explores its application in digital humanities.Firstly,it uses natural language processing methods to pre-process the ICH ceramics corpus and annotate the corpus according to the domain terminology lexicon.Secondly,it uses the Random-CRFs model to investigate how the term extraction is influenced by dictionary(DICT),part-of-speech(POS),radical(Radical),and pinyin(Pinyin)features,and compares the impact of four models,Random-CRFs,Random-BiLSTM,Random-BiLSTM-CRFs,and BERT-BiLSTM-CRFs,on term extraction.Finally,a trained model is used to identify new words from the test corpus,and the extracted candidate words are manually evaluated.A terminology database of 1,173 terms in the field of ICH ceramics is developed and applied to ICH project portraits,ICH ceramics knowledge graphs and ICH ceramics term retrieval.
intangible cultural heritagedomain terminologynew word discoverydigital humanities