Personalized Experiment Report Comments Auto-Generation and Application Based on Large Language Models
While reviewing computer experiment reports,assessment systems exhibit diversity and discrepancies.The rigid templates used for evaluation lack personalized content,and the results often fail to provide a basis for interpretability.To address these issues,this study proposes a personalized experiment report comments auto-generation framework based on large language models.The study employs a Theme-Evaluation Decisions-Integrated(T-ED-I)hint strategy to extract a unique evaluation system based on teachers'requirements regarding experiment and code quality.This strategy ultimately builds a shared library of assessment decision trees for computer software courses.It introduces a method for grading experiments and code-quality themes based on large language models and decision trees.By retrieving an evaluation decision tree from the library that matches a student's experiment report and integrating the report and code text,the proposed method auto-generation quantitative or qualitative grading results for the experiment and code quality,along with corresponding interpretative justifications.Finally,personalized evaluation comments are generated by integrating the students'completed experimental tasks,theme grading results,and evaluation bases into a experiment report template.The experimental results show that the decision trees generated using the T-ED-I hint strategy significantly outperform those generated from strategies without hints.Ablation studies confirm the effectiveness and rationality of each component of this strategy.Additionally,when comparing the auto-generation grading results with the original teacher evaluations,the match rate for software engineering,programming,and interdisciplinary courses exceeds 90%.Moreover,teachers'ratings on the auto-generation comments in terms of fluency,relevance,and rationality indicate a high level of quality across these dimensions.
large language modelsdecision trees of experimental evaluationpersonalizationcomments auto-generationcode quality assessment