首页|基于相异度矩阵的碎片化回复文本聚类方法

基于相异度矩阵的碎片化回复文本聚类方法

扫码查看
针对问答社区碎片化回复文本中有效抽取所需文本信息的问题,本文提出一种基于相异度矩阵的碎片化回复文本聚类方法.首先,根据文本之间相异度设计聚类中心,以聚类方式将社区中碎片化回复文本分类;然后,使用基于RNN+CNN的问题文本特征提取方法提取用户问题的文本特征;最后,结合提取的问题文本特征,使用基于TF-IDF算法的抽取式文本自动生成算法,实现回复文本信息的快速自动提取.实验结果表明本文方法可以自动抽取所需文本信息,抽取结果精度高且稳定,可应用于问答社区碎片化回复文本的抽取.
Text Clustering Method for Fragmented Reply Based on Dissimilarity Matrix
In response to the problem of effectively extracting the required text information from fragmented reply texts in Q&A communities,this paper proposes a clustering method for fragmented reply texts based on dissimilarity matrix.Firstly,the clus-tering center is designed based on dissimilarity between texts and the fragmented reply texts in the community are classified by the clustering way.Then,the text features of user questions are extracted based on RNN+CNN.Finally,the automatic extraction of fragmented response text is achieved based on TF-IDF algorithm using the extracted question text features.The experimental results show that the proposed method can automatically extract the required text information with high accuracy and stability,and can be applied to the extraction of fragmented reply texts in question answering communities.

question-answer communityfragmented reply textautomatic extractionclusteringdissimilarity

刘文亮、吴飞、何德明、赵维伟、潘建宏

展开 >

国家电网福建省电力公司,福建 福州 350000

福建亿榕信息技术有限公司,福建 福州 350003

国家电网有限公司,北京 100000

问答社区 碎片化回复文本 自动抽取 聚类 相异度

福建省科技项目

SGFJ0000KXJS1700225

2024

计算机与现代化
江西省计算机学会 江西省计算技术研究所

计算机与现代化

CSTPCD
影响因子:0.472
ISSN:1006-2475
年,卷(期):2024.(9)
  • 26