首页|基于多步句子选择-重写模型生成科技文献创新点

基于多步句子选择-重写模型生成科技文献创新点

扫码查看
近年来科技文献数量的显著增加,使得研究人员难以跟上自己所在领域的最新进展.为了保持对前沿研究的追踪,研究者通常依赖于阅读文献中的创新点,该部分简明扼要地概括了关键研究成果.然而,许多作者在文中并未充分地呈现文章的创新内容,这导致读者难以快速掌握研究的核心内容.为了解决这一问题,提出了一个全新的任务,即自动生成科技文献的创新点摘要.该任务的难点之一在于目前缺少相关数据集,于是构建了科技创新点摘要语料库(SCSC).另一个难点在于目前现有的生成式或抽取式模型在生成创新点方面分别存在冗余度过高和句与句之前缺乏关联性的问题.为了满足生成简洁、高质量创新点的需求,提出了 MSSRsum模型(一个多步句子选择-重写模型).最终实验表明,所提模型在SCSC和arXiv数据集上优于基线模型.
Generation of Contributions of Scientific Paper Based on Multi-step Sentence Selecting-and-Rewriting Model
There has been a significant surge in the number of scientific papers published in recent years,which makes it challen-ging for researchers to keep up with the latest advancements in their fields.To stay updated,researchers often rely on reading the contributions section of papers,which serves as a concise summary of the key research findings.However,it is not uncommon for authors to inadequately present the innovative content of their articles,making it difficult for readers to quickly grasp the essence of the research.To address this issue,we propose a novel task of contribution summarization to automatically generate contribu-tion summaries of scientific papers.One of the challenges of this task is the lack of relevant datasets.Therefore,we construct a scientific contribution summarization corpus(SCSC).Another issue lies in the fact that currently available abstractive or extrac-tive models tend to suffer from either excessive redundancy or a lack of coherence between sentences.To meet the demand of ge-nerating concise and high-quality contribution sentences,we present MSSRsum,a multi-step sentence selecting-and-rewriting model.Experiments show that the proposed model outperforms baselines on SCSC and arXiv datasets.

SummarizationScientific papersMulti-step sentence selecting-and-rewritingGeneration of contributions

许贤哲、陈景强

展开 >

南京邮电大学计算机学院 南京 210023

江苏省大数据安全与智能处理重点实验室(南京邮电大学) 南京 210023

摘要 科技文献 多步句子选择-重写 生成创新点

国家自然科学基金江苏省高校自然科学研究项目

6180610121KIB520017

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(10)