计算机应用与软件2024,Vol.41Issue(6) :101-107,133.DOI:10.3969/j.issn.1000-386x.2024.06.015

面向法院电子卷宗的文本分类方法研究

TEXT CLASSIFICATION METHOD FOR COURT ELECTRONIC FILE

王霄 万玉晴
计算机应用与软件2024,Vol.41Issue(6) :101-107,133.DOI:10.3969/j.issn.1000-386x.2024.06.015

面向法院电子卷宗的文本分类方法研究

TEXT CLASSIFICATION METHOD FOR COURT ELECTRONIC FILE

王霄 1万玉晴1
扫码查看

作者信息

  • 1. 太极计算机股份有限公司 北京 100102
  • 折叠

摘要

针对法院电子卷宗文本分类的主要问题,给出相应解决方案.提出卷宗文件的多维度语义表示方法,得到更准确全面的文本特征信息;使用基于高斯核的KELM(Kernel Extreme Learning Machine)学习文本分类器,获取全局最优解的同时大幅提高训练效率;使用基于RLS(Recursive Least Squares)的序列优化模型KOS-ELM,通过新样本对模型参数迭代更新,使分类模型具备在线自学习的能力,减少了对初始样本的依赖.对比实验证明,基于高斯核的KELM分类模型在正确率上比BP网络模型和LSSVM分别提高了 2.66百分点和4.43百分点,但训练时间只有两者的1/6和1/10;采用多维度语义表示方法为模型提供输入,在正确率上比文本向量和词向量表示方法分别提高了 8.84百分点和2.33百分点;采用基于RLS的序列优化模型KOS-ELM对弱分类器进行迭代优化,以4种不同步长迭代20次后,分类正确率均得到显著提升.

Abstract

This paper provides corresponding solutions to the main problems in the text classification of court electronic files.We propose a multi-dimensional semantic representation method for court case file to obtain more accurate and comprehensive text feature information.The Gaussian kernel-based kernel extreme learning machine(KELM)learning text classifier was used to get the global optimal solution while greatly improving the training efficiency.The sequence optimization model KOS-ELM based on recursive least squares(RLS)was used to iteratively update the model parameters through new samples.The solutions enabled the classification model to learn online by itself and reduce the dependence on the initial samples.Through comparative experiments,it was proved that the accuracy of the Gaussian kernel-based KELM classification model was 2.66 percentage points and 4.43 percentage points higher than that of the BP network model and LSSVM,but the training time was only 1/6 and 1/10 of the two.The multi-dimensional semantic representation method was used to provide input for the model,and the accuracy rate was 8.84 percentage points and 2.33 percentage points higher than the text vector and word vector representation methods respectively.The RLS-based sequence optimization model KOS-ELM was used to iteratively optimize the weak classifier.After 20 iterations with 4 different types of step-size,the classification accuracy was significantly improved.

关键词

法院电子卷宗/文本分类/语义表示/核极限学习机/递归最小二乘

Key words

Court electronic file/Text classification/Semantic representation/Kernel extreme learning machine/Recursive least squares

引用本文复制引用

基金项目

国家重点研发计划项目(2018YFC0807700)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
段落导航相关论文