基于预训练模型和Transformer架构的大数据与计算机类科普书籍难度分类研究

Research on Difficulty Classification of Big Data and Computer Popular Science Books Based on Pretrained Models and Transformer Architecture

扫码查看

原文链接

维普
万方数据

中文摘要：针对当前研究在书籍级长文本可读性评估方面的不足,本文提出了一种新颖的PTDE-CAC模型.该模型将书籍分割为固定片段,利用无监督聚类获取难度感知片段,对预训练模型进行再训练,使其学习难度知识,将长文本表示为多个不同难度级别的向量.本文构建了大数据、计算机科普教材分级数据集,实验证明PTDE-CAC模型在可读性评估中表现优异,优于传统方法和现有预训练模型.本工作为书籍级可读性评估提供了新思路,也为相关教材编写选择提供了参考.

外文摘要：To address the inadequacy in book-level long text readability assessment,we propose a novel PTDE-CAC model.It divides books into fixed segments,obtains difficulty-aware segments via unsupervised clustering,and retrains a pre-trained model to learn difficulty knowledge,representing long texts as multiple vectors with different difficulty levels.This article construct a graded dataset of big data and computer science popular textbooks.Experiments prove PTDE-CAC outperforms traditional methods and existing pre-trained models in readability assessment.This work provides a new approach for book-level readability assessment and a reference for relevant textbook compilation and selection.

外文关键词：

book-level long textsreadability assessmentPTDE-CAC modeldifficulty-aware pre-trainingmulti-view representationbig datacomputer science popular textbooks grading dataset

作者：

黄启洲

展开 >

作者单位：

联通数字科技有限公司,北京 100032

关键词：

书籍级长文本可读性评估 PTDE-CAC模型难度感知预训练多视角表示大数据计算机科普教材分级数据集

出版年：

2024

DOI：

10.3969/j.issn.1003-6970.2024.07.046

软件

中国电子学会天津电子学会

软件

影响因子：1.51

ISSN：1003-6970

年,卷(期)：2024.45(7)