语义相似度与BERT模型融合的多标签文本自适应分类方法

Multi-label Text Adaptive Classification Method Fused by Semantic Similarity and BERT Model

张红¹

扫码查看

作者信息

1. 北京太极信息系统技术有限公司,北京 100000
折叠

摘要

针对文本搜索需求难以判断、文本难以分类的问题,研究语义相似度与双向语言编码器(BERT)模型融合的多标签文本自适应分类方法.先预处理文本并确定文本表示形式,基于信息增益理论提取并降维处理文本特征,依据语义相似度理论计算文本之间相似度,再引入BERT模型搭建多标签文本自适应分类框架,通过对抗训练获取模型最佳参数,将待分类文本输入至训练好文本分类BERT模型中,即可实现多标签文本的自适应分类.实验数据显示应用提出方法获得F1参量大于给定最小限值,汉明损失参量HL小于给定最大限值,充分证实了提出方法文本分类效果较佳.

Abstract

The text search requirements are difficult to judge.and the text is difficult to classify,hence,a multi-label text adap-tive classification method based on the fusion of semantic similarity and bidirectional encoder representations from transformers(BERT),bidirectional language encoder model is studied.It preprocesses the text and determines the text representation,ex-tracts and reduces the text features based on the information gain theory,calculates the similarity between texts based on the semantic similarity theory,introduces the BERT model to build a multi-label text adaptive classification framework,and obtains the model through adversarial training.The best parameters are inputted,the text to be classified is inputted into the trained text classification BERT model,then the adaptive classification of multi-label text can be realized.The experimental data show that the F1 parameter obtained by the proposed method is greater than the given minimum limit,and the Hamming loss parame-ter is less than the given maximum limit,which fully confirms that the proposed method has better text classification effect.

关键词

BERT模型/多标签/语义相似度/文本分类

Key words

BERT model/multi-label/semantic similarity/text classification

引用本文复制引用

基金项目

四川省公安科研中心重点研发项目(2019YFS0067)

出版年

2024

微型电脑应用

上海市微型电脑应用学会

微型电脑应用

CSTPCD

影响因子：0.359

ISSN：1007-757X

参考文献量7

段落导航