The suddenness,variability,and unpredictability of public health emergencies have increased the difficulty of processing and monitoring public health information.Constructing domain-specific pretrained model can enhance the performance of downstream tasks.Currently,there are some enhanced pre-training models in the field of public health that focus on social media and medical domains.However,these models have small training corpora,limited data sources,and short text lengths,lacking long text unsupervised corpora with rich semantic information.To address this issue,we adopt the BERT model to perform adaptive pre-training on a large-scale corpus of public health news in order to construct a domain-specific pretrained model,called PHD-News-BERT,which is suitable for deep semantic learning and facilitates learning tasks in the field.By conducting experiments on eight datasets from five downstream tasks and comparing them with five baseline models,the results demonstrate that PHD-News-BERT achieves significant performance in most tasks,indicating its excellent generalization and robustness.It is expected to introduce new benchmarks for future work in the field of public health.
关键词
公共卫生/领域预训练模型/自然语言处理/BERT/自适应预训练
Key words
public health/domain-specific pretrained language model/natural language processing/BERT/adaptive pre-training