Research on Difficulty Classification of Big Data and Computer Popular Science Books Based on Pretrained Models and Transformer Architecture
To address the inadequacy in book-level long text readability assessment,we propose a novel PTDE-CAC model.It divides books into fixed segments,obtains difficulty-aware segments via unsupervised clustering,and retrains a pre-trained model to learn difficulty knowledge,representing long texts as multiple vectors with different difficulty levels.This article construct a graded dataset of big data and computer science popular textbooks.Experiments prove PTDE-CAC outperforms traditional methods and existing pre-trained models in readability assessment.This work provides a new approach for book-level readability assessment and a reference for relevant textbook compilation and selection.
book-level long textsreadability assessmentPTDE-CAC modeldifficulty-aware pre-trainingmulti-view representationbig datacomputer science popular textbooks grading dataset