长文本摘要生成:基于Pegasus模型的STM32论文摘要生成分割策略研究
Long Text Abstract Generation:Research on Segmentation Strategy of STM32 Abstract Generation Based on Pegasus Model
龙川 1张芹 1谢亮生 1潘琛 1文瑜 1杨俊锋1
作者信息
- 1. 南昌航空大学测试与光电工程学院,江西南昌 330000
- 折叠
摘要
研究探讨了使用预训练的Pegasus模型进行长文本摘要时,不同文本分割方法对摘要质量的影响.收集来自知网的200篇关于STM32单片机的学术论文作为实验文本,比较了滑动窗口、句子分割、段落分割及滑动窗口加句子分割四种分割法的长文本摘要生成效果.实验使用ROUGE(Recall-Oriented Understudy for Gisting Evaluation)指标对生成的摘要进行评估,并对实验结果进行了详细分析.在生成摘要的质量方面,段落分割法表现出色,其ROUGE-1、ROUGE-2和ROUGE-L评分分别达到了30.85、7.60和20.15,轻微超过了句子分割法的评分,且显著优于句子分割加滑动窗口法.该研究旨在为研究者和开发者提供关于长文本摘要的实践经验和见解.
Abstract
This study explores the effects of different text segmentation methods on the quality of long text summaries using pre-trained Pegasus model. This paper collects 200 academic papers about STM32 MCU from Knownet as experimental text,and compares the generation effect of four segmentation methods:sliding window,sentence segmentation,paragraph segmentation and sliding window plus sentence segmentation. In the experiment,ROUGE (Recall-Oriented Understudy for Gisting Evaluation) index was used to evaluate the generated abstracts,and the experimental results were analyzed in detail. In terms of the quality of abstracts generated,paragraph segmentation performed well,with the scores of ROUGE-1,ROUGE-2 and ROUGE-L reaching 30.85,7.60 and 20.15,respectively,slightly exceeding the scores of sentence segmentation and significantly superior to sentence segmentation plus sliding window. This study is to provide researchers and developers with practical experience and insights on long text summaries.
关键词
长文本摘要/分割策略/Pegasus模型/STM32学术论文摘要Key words
long text abstract/segmentation strategy/Pegasus model/STM32 abstract of academic papers引用本文复制引用
基金项目
江西省创新领军人才长期项目(S2020LQCQ0889)
江西省自然科学基金(20212BAB201022)
教育部产学研协同育人项目(202002032008)
出版年
2024