图书情报工作2024,Vol.68Issue(5) :121-131.DOI:10.13266/j.issn.0252-3116.2024.05.012

文本主题视角下多标签分类技术驱动的网络学术社区答案排序研究

Research on Answer Ranking in Online Academic Communities Based on Multi-Label Classification Technology from the Perspective of Text Topics

林立涛 吴梦成 刘畅 胡蝶 王东波 黄水清
图书情报工作2024,Vol.68Issue(5) :121-131.DOI:10.13266/j.issn.0252-3116.2024.05.012

文本主题视角下多标签分类技术驱动的网络学术社区答案排序研究

Research on Answer Ranking in Online Academic Communities Based on Multi-Label Classification Technology from the Perspective of Text Topics

林立涛 1吴梦成 2刘畅 2胡蝶 2王东波 2黄水清2
扫码查看

作者信息

  • 1. 南京大学信息管理学院 南京 210023
  • 2. 南京农业大学信息管理学院 南京 210095;人文与社会计算江苏省高校哲学社会科学重点研究基地 南京 210095;南京农业大学领域知识关联研究中心 南京 210095
  • 折叠

摘要

[目的/意义]网络学术社区中的用户生成答案质量良莠不齐,难以为用户提供高效的决策支持,筛选高可用性答案能够促进网络学术社区问答知识的高效利用.[方法/过程]从文本主题语义视角出发,提出一种基于深度预训练语言模型和多标签分类技术的问答相关性计算方法,用于实现对网络学术社区用户答案的有用性排序.该方法首先提取问题文本和答案文本的语义向量,然后进一步将其映射到领域化的主题向量空间,从而实现对问题和答案主题相似度的计算.[结果/结论]以"小木虫"学术社区论文投稿板块"求助完结"栏目下的所有提问及每条提问下的全部答案为实验数据,以NDCG、Q-Measure为评测指标,将本文方法与Cross-Encoder和Bi-Encoder两种基于语义的常规排序方法进行比较,发现本文方法与常规方法性能相当,但是对标注数据的需求更少.

Abstract

[Purpose/Significance]The uneven quality of user-generated answers in online academic commu-nities makes it difficult for users to obtain efficient decision support.Filtering high-availability answers can promote the efficient use of question and answer knowledge in online academic communities.[Method/Process]From the perspective of text topic semantics,this paper proposed a question and answer correlation calculation based on a deep pre-training language model and multi-label classification technology,which was used to achieve the useful-ness ranking of user generated answers.It first extracted the semantic vectors of question and answer text,and then further mapped them into a field-specific topic vector space,thereby realizing the calculation of topic similarity between questions and answers.[Result/Conclusion]Taking all the questions and answers under the"Help Com-pletion"of the thesis submission in"Xiaomuchong"academic community as experimental data,it uses NDCG and Q-Measure as evaluation indicators,and compares with two conventional semantic-based sorting methods such as Cross-Encoder and Bi-Encoder.Experiment result shows that the performance of the proposed method is equivalent to that of conventional methods,but requires less annotation data.

关键词

网络学术社区/用户生成内容/主题挖掘/答案排序/问答相关性/多标签分类

Key words

online academic community/user-generated content/topic mining/answer ranking/ques-tion-answer relevance/multi-label classification

引用本文复制引用

出版年

2024
图书情报工作
中国科学院文献情报中心

图书情报工作

CSTPCDCSSCICHSSCD北大核心
影响因子:2.203
ISSN:0252-3116
参考文献量34
段落导航相关论文