考虑主题兴趣和领域权威的问答社区专家推荐研究

Expert Recommendation in Q&A Community Based on Topic Interest and Domain Authority

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的]对用户历史问答文本实现考虑上下文语义信息的主题识别,进而提升问答社区专家推荐的准确度.[方法]通过构建BERT-LLDA模型,将BERT模型与Labeled-LDA主题模型相结合,充分利用标签信息对用户历史问答文本进行向量化,通过降维和主题聚类实现考虑上下文语义信息的主题识别,获得用户的主题兴趣概率分布;根据主题兴趣挖掘结果构建主题敏感PageRank算法(TSPR),并加入用户质量权重迭代计算用户的领域权威;基于此得到考虑主题兴趣和领域权威的问答社区专家推荐算法TIDARank,为新问题推荐潜在回答专家.[结果]基于Stack Exchange公开数据集,BERT-LLDA模型经过主题聚类后相比TF-IDF、BERT、BERT-LDA等对比模型具有更高的轮廓系数(0.575 6)和主题连贯性(0.476 6);TIDARank算法的最佳回答者命中率ACC@20和平均倒数排名MRR@20分别为0.580 7和0.243 0,相比于表现最优的对比模型Bi-LSTM+TSPR分别提升0.145和0.081.[局限]在链接分析中未考虑用户的活跃情况.[结论]BERT-LLDA模型不仅可以优化主题聚类的效果,且有助于提升问答社区专家推荐的性能.

外文摘要：[Objective]This paper aims to enhance the accuracy of expert recommendations in Q&A communities based on topics of users'historical Q&A texts and contextual information.[Methods]First,we combined the BERT model with the Labeled-LDA model.Then,we utilized the label information to vectorize users'historical Q&A texts.Third,we identified contextual topics with dimension reduction and topic clustering.We also obtained the probability distribution of the expert's topic interests.Fourth,based on the results of topic interest mining,we constructed the Topic Sensitive PageRank Algorithm(TSPR).We used the users'quality weight to calculate their domain authority iteratively.From this,we proposed the TIDARank algorithm for expert recommendation.[Results]Based on the Stack Exchange public dataset,the BERT-LLDA model outperformed TF-IDF,BERT,and BERT-LDA models on silhouette coefficient(0.5756)and topic coherence(0.4766).The ACC@20 and MRR@20 of TIDARank reached 0.5807 and 0.2430,respectively,improved by 0.145 and 0.081 compared with the best-performing Bi-LSTM+TSPR baseline algorithm.[Limitations]We did not consider user activity in link analysis.[Conclusions]The BERT-LLDA model could optimize topic clustering for question-answering texts and improve the performances of expert recommendations in Q&A communities.

外文关键词：

Community Question AnsweringExpert RecommendationBERTLabeled-LDAPageRank

作者：

李明珠、米传民、苟小义、肖琳

展开 >

作者单位：

南京航空航天大学经济与管理学院南京 210016

关键词：

社区问答专家推荐 BERT Labeled-LDA PageRank

基金：

教育部人文社会科学研究项目

项目编号：

20YJC630163

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0433

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(5)

参考文献量30