面向课堂教学内容的知识点标题生成

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：互联网时代信息量庞大,简洁的标题可以提高信息阅读效率.在课堂场景下,知识点标题生成便于用户整理和记忆课堂内容,提高课堂学习效率.该文将标题生成应用于课堂教学领域,制作了课堂知识点文本—标题数据集;提出了一种改进的 TextRank 算法——考虑关键字和句子位置的文本排序(textranking considering keywords and sentence positions,TKSP)算法,该算法综合考虑了关键词和句子位置等因素对句子权重的影响,能够更准确地提取文本重点信息.使用以召回率为导向的摘要评价(recall-oriented understudy for gisting evaluation,ROUGE)方法,TKSP 算法在 ROUGE-1、ROUGE-2和ROUGE-L指标上的得分率分别为51.20％、33.42％和50.48％,将TKSP抽取式算法与统一语言模型(unified language model,UniLM)结合,并融合文本主题信息,提出统一语言模型结合考虑关键字和句子位置的文本排序算法的模型(unified language modeling combined textranking considering keywords and sentence positions,UniLM-TK),UniLM-TK 在各指标上的得分率分别为73.29％、58.12％和72.87％,与UniLM模型相比,UniLM-TK在各指标上分别提高了 0.74％、2.26％和0.87％,证明UniLM-TK模型生成的标题更准确、更有效.

外文标题：generation of knowledge points for classroom teaching

外文摘要：[Objective]In the digital age,brief titles are critical for efficient reading.However,headline generation technology is mostly used in news rather than in other domains.Generating key points in classroom scenarios can enhance comprehension and improve learning efficiency.Traditional extractive algorithms such as Lead-3 and the original TextRank algorithm fail to effectively capture the critical information of an article.They merely rank sentences based on factors such as position or text similarity,overlooking keywords.To address this issue,herein,an improved TextRank algorithm—text ranking combining keywords and sentence positions(TKSP)—is proposed.Extractive models extract information without expanding on the original text,while generative models generate brief and coherent headlines,they sometimes misunderstand the source text,resulting in inaccurate and repetitive headings.To address this issue,TKSP is combined with the UniLM generative model(UniLM-TK model)to incorporate text topic information.[Methods]Courses are collected from a MOOC platform,and audio are extracted from teaching videos.Speech-to-text conversion are performed using an audio transcription tool.The classroom teaching text are organized,segmented based on knowledge points,and manually titled to generate a dataset.Thereafter,an improved TextRank algorithm-TKSP-proposed here is used to automatically generate knowledge points.First,the algorithm applies the Word2Vec word vector model to textrank.TKSP considers four types of sentence critical influences:(1)Sentence position factor:The first paragraph serves as a general introduction to the knowledge point,leading to higher weight.Succeeding sentences have decreasing weights based on their position.(2)Keyword number factor:Sentences with keywords contain valuable information,and their importance increases with the number of keywords present.The TextRank algorithm generates a keyword list from the knowledge content.Sentence weights are adjusted based on the number of keywords,assigning higher weights to sentences with more keywords.(3)Keyword importance factor:Keyword weight reflects keyword importance arranged in descending order.Accordingly,sentence weights are adjusted;the sentence with the first keyword has the highest weight,while sentences with the second and third keywords have lower weights.(4)Sentence importance factor:The first sentence with a keyword serves as a general introduction,more relevant to the knowledge point.The sentence weight is the highest for this sentence and decreases with subsequent occurrences of the keyword.These four influencing factors of sentence weight are integrated to establish the sentence weight calculation formula.Based on the weight value of the sentence,the top-ranked sentence is chosen to create the text title.Herein,the combined TKSP algorithm and UniLM model,called the UniLM-TK model,is proposed.The TKSP algorithm is employed to extract critical sentences,and the textrank algorithm is employed to extract a topic word from the knowledge text.These are separately embedded into the model input sequence,which undergoes transformer block processing.The critical sentence captures text context using self-attention,while the topic word incorporates topic information through cross-attention.The final attention formula is established by weighting and summing these representations.The attention mechanism output is further processed by a feedforward network to extract high-level features.The focused sentences extracted by TKSP can effectively reduce the extent of model computation and data processing difficulty,allowing the model to focus more on extracting and generating focused information.[Results]The TKSP algorithm outperformed classical extractive algorithms(namely maximal marginal relevance,latent Dirichlet allocation,Lead-3,and textrank)in ROUGE-1,ROUGE-2,and ROUGE-L metrics,achieving optimal performances of 51.20％,33.42％,and 50.48％,respectively.In the ablation experiments of the UniLM-TK model,the optimal performance was achieved by extracting seven key sentences,with specific indicator performances of 73.29％,58.12％,and 72.87％,respectively.Comparing the headings generated by the UniLM-TK model and GPT3.5 API,the headings generated by UniLM-TK were brief,clear,accurate,and more readable in summarizing the text topic.Experiments were performed for real headings using a large-scale Chinese scientific literature dataset to compare the UniLM-TK and ALBERT models;the UniLM-TK model improved the ROUGE-1,ROUGE-2,and ROUGE-L metrics by 6.45％,3.96％,and 9.34％,respectively.[Conclusions]The effectiveness of the TKSP algorithm is demonstrated by comparing it with other extractive methods and proving that the headings generated by UniLM-TK exhibit better accuracy and readability.

外文关键词：

classroom teachingtitle generationtopic informationTextRankUniLM

作者：

肖思羽、赵晖

展开 >

作者单位：

新疆大学软件学院,乌鲁木齐 830017

新疆大学信息科学与工程学院,乌鲁木齐 830017

关键词：

课堂教学标题生成主题信息 TextRank UniLM

基金：

国家自然科学基金资助项目

项目编号：

62166041

出版年：

2024

DOI：

10.16511/j.cnki.qhdxxb.2023.26.059

清华大学学报(自然科学版)

清华大学

清华大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.586

ISSN：1000-0054

年,卷(期)：2024.64(5)

参考文献量19