Research on the Construction Method and Distribution Law of Object-Image Database for Ancient Poetry
From the perspective of digital humanities,ancient poetry resources are of great value but difficult to be analyzed at scale.The research on the automatic construction method of knowledge base of ancient poetry is conducive to the analysis and research of ancient poetry from a macro perspective and the mining of its value.Firstly,based on the concept of"object image",the key information in ancient poems is extracted to reduce the complexity of analysis to build an automated process.Secondly,roberta-BilstM-CRF model is constructed based on deep leaming method,and object image is extracted from ancient poetry corpus.Then,The Whole Tang Dynasty Poems and some Song Dynasty poetry resources are used to verify the feasibility and universality of the model.Finally,the object image database of The Whole Tang Dynasty Poems is constructed successfully,and the distribution law of the object images is preliminarily analyzed.After using the automatic tagging corpus training model,the F1 scores of common nouns,time nouns and place names reached 89.6%,93.3%and 93.6%respectively.The model was transferred to the Song Dynasty poetry corpus that was not used for training,and the extraction density was 4.5 objects per poem,which showed the ability to discover unknown words,indicating that the model has good universality and expansibility.
Digital humanisticAncient poetryObject imageDeep leaming