基于微调原型网络的小样本敏感信息识别方法

Few-shot Sensitive Information Recognition Based on Prototype Network Fine-tuning

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：敏感信息识别主要是指识别互联网上涉及色情、毒品、邪教、暴力等类型的敏感信息,现有的敏感信息识别通常将其看作文本分类任务,但由于缺乏大规模的敏感信息标注数据,分类效果不佳.该文提出一种基于微调原型网络的小样本敏感信息识别方法,在小样本学习框架下,利用快速适应的微调原型网络来缓解元训练阶段通用新闻领域和元测试阶段敏感信息数据差异大的问题.首先,在元训练阶段,基于通用新闻领域的分类数据训练模型来学习通用知识,同时在训练过程中经过两阶段梯度更新,得到一组对新任务敏感的快速适应初始参数,然后在元测试阶段敏感文本数据集的新任务上,冻结模型部分参数并使用支持集进一步微调,使模型更好地泛化到敏感识别领域上.实验结果证明,相比当前最优的小样本分类模型,该文提出的快速适应微调策略的原型网络显著提升了敏感信息识别效果.

外文摘要：Sensitive information recognition refers to the identification of sensitive massages related to pornography,drugs,cult,violence and other types of sensitive information on the Internet.A few-shot sensitive information rec-ognition based on prototype network fine-tuning is proposed in this paper.The proposed method employs the fast adaptation function under the framework of few-shot learning to bridge the domain gap between the dataset in meta-training stage and that of meta-test stage.Specifically,the proposed model is trained on general news domain in me-ta-training stage with a two-stage gradient update mechanism to obtain a group of initial parameters.In meta-testing stage,model freezes a part of parameters to be fast finetuned for the sensitive text dataset.The experimental results show that the performance of the proposed model in sensitive information recognition task is significantly improved compared to a strong baseline few-shot model.

外文关键词：

sensitive information recognitionfew-shot learningfine-tuning strategyprototype network

作者：

余正涛、关昕、黄于欣、张思琦、赵庆珏

展开 >

作者单位：

昆明理工大学信息工程与自动化学院,云南昆明 650500

昆明理工大学云南省人工智能重点实验室,云南昆明 650500

关键词：

敏感信息识别小样本学习微调策略原型网络

基金：

国家自然科学基金国家自然科学基金国家自然科学基金云南省重大科技专项计划项目云南省重大科技专项计划项目云南省高新技术产业专项云南省基础研究专项面上项目

项目编号：

U21B202761972186861732005202202AD080003202002AD080001201606202001AT070046

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(1)

浏览量1
被引量1
参考文献量29