Construction of Scholar Profile Based on Generative Pre-Trained Language Model
In the era of big data,the information of scholars in the Internet that exists in a multi-source heterogeneous and unstructured form is accompanied by problems such as attribute confusion and long entities during entity extraction,which seriously affect the accuracy of the construction of scholar profiles.Meanwhile,the scholar attribute entity extraction model,as a key model in the construction of scholar profiles,still presents significant technical barriers in practical applications,which pose certain obstacles to the widespread application of scholar profiles.Therefore,based on open resources,we construct an attribute entity extraction method based on generative pre-trained language models through guided sentence modelling,autoregressive generation approach,and training corpus fine-tuning,and validate the method from four aspects:overall model effect,entity category extraction effect,instance analysis of the main influencing factors,and analysis of sample fine-tuning impact.Compared with the contrastive models,the method proposed in this paper achieves optimal performance across 12 categories of scholar attribute entities,with a comprehensive F1 score of 99.34%.It not only effectively identifies and differentiates mutually confusing attribute entities,but also enhances the extraction precision of typical long attribute entities such as"research interests"by 6.11%.This method provides more expedient and effective methodological support for the engineering application of scholar profiles.
Generative Pre-Trained Language ModelSample Fine-TuningScholar ProfileGPT-3