首页|基于K-BERT-LDA的层级多标签招标标段分类方法

基于K-BERT-LDA的层级多标签招标标段分类方法

扫码查看
传统人工招标分标效率准确率低,针对语义特征稀疏且标签具有明显层级结构特点的物资招标文本,提出了一种基于K-BERT-LDA的层级多标签文本分类方法.首先,通过混合模型提取文本特征,K-BERT模型提取具有知识注入的文本特征以弥补语义信息缺失,LDA主题模型提取主题分布特征,并通过特征融合进一步丰富文本特征表示.其次,联合嵌入类别标签,即上层标签预测结果能引导下层分类,并充分利用标签间的树形结构关系提升多标签文本分类准确性.最后,提出一种基于文本相似度算法的智能处理策略,通过合并预投资金额不足的标段以保障招标成功率并得到分标结果.实验表明,所提方法相较于其他分类方法及单一模型而言分类性能更好,准确率、精确度和F1值分别达到95.45%、92.57%和91.88%,能高效、准确地实现智能分标目的.
A Hierarchical Multi-Label Bidding Section Classification Method Based on K-BERT-LDA
Traditional manual bidding has low efficiency and accuracy in dividing bids.A hierarchical multi label text classification method based on K-BERT-LDA is proposed for material bidding texts with sparse semantic features and obvious hierarchical structure of labels.First-ly,text features are extracted through a hybrid model.The K-BERT model extracts text features with knowledge injection to compensate for se-mantic information gaps.The LDA topic model extracts topic distribution features and further enriches the text feature representation through feature fusion.Secondly,joint embedding of category labels,where the prediction results of upper level labels can guide lower level classifica-tion and fully utilize the tree structure relationship between labels to improve the accuracy of multi label text classification.Finally,an intelli-gent processing strategy based on text similarity algorithm is proposed to ensure the success rate of bidding and obtain bidding results by merg-ing sections with insufficient pre investment amounts.The experiment shows that the proposed method has better classification performance than other classification methods and single model,and the accuracy,precision and F1 value reach 95.45%,92.57%and 91.88%respectively,which can effectively and accurately achieve the goal of intelligent classification.

bidding divisionhierarchical multi-label text classificationknowledge injectiontopic distributionfeature fusiontext simi-larity

侯继辉、吴小忠、刘晖、夏卓群、梁涤青、邱涵、徐嘉慧

展开 >

湖南湘能创业项目管理有限公司,湖南 长沙 410221

长沙理工大学 计算机与通信工程学院,湖南 长沙 410000

招标分标 层级多标签文本分类 知识注入 主题分布 特征融合 文本相似度

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(12)