首页|用于小样本跨语言文本分类的元对比学习框架

用于小样本跨语言文本分类的元对比学习框架

Contrastive meta-learning framework for few-shot cross-lingual text classification

扫码查看
众多的安全风控问题均为文本分类问题,国际场景下的舆情分析等风控问题涉及多种语言,是一大难点.先前的研究表明,通过跨语言语义知识迁移可以显著提高小样本文本分类任务的性能.然而,跨语言文本分类的发展仍面临着一系列挑战.获得语义无关的文本表征是一项困难的任务.不同语言之间的语法结构和句法规则引起文本表征的差异,因此提取通用的语义信息较为困难.此外,跨语言文本分类的标签数据十分稀缺.在很多现实场景中,只能获得少量的标记数据,这严重降低了许多方法的性能.因此需要有效的方式能够在小样本情况下准确地迁移知识,提高分类模型的泛化能力.为应对这些挑战,提出了集成对比学习和元学习的框架,该框架集成了对比学习和元学习的优势,利用对比学习来提取与语言无关的通用语义信息,同时利用元学习快速泛化的优势来改善小样本场景中的知识迁移.此外,提出了基于任务的数据增强方法,以进一步提高所提框架在小样本跨语言文本分类中的性能.通过在两个广泛使用的多语言文本分类数据集上进行大量实验,证实了所提方法能够有效提升文本分类的准确性,可有效应用于风控安全领域.
Many security risk control issues,such as public opinion analysis in international scenarios,have been identified as text classification problems,which are challenging due to the involvement of multiple languages.Pre-vious studies have demonstrated that the performance of few-shot text classification tasks can be enhanced through cross-lingual semantic knowledge transfer.However,the advancement of cross-lingual text classification is faced with several challenges.Firstly,it has been found difficult to obtain language-agnostic representations that perform well in cross-lingual transfer.Moreover,the differences in grammatical structure and syntactic rules between differ-ent languages cause variations in text representation,making it difficult to extract general semantic information.Ad-ditionally,the scarcity of labeled data has been identified as a severe constraint on the performance of most existing methods.In many real-world scenarios,only a small amount of labeled data is available,which has been found to severely degrade the performance of many methods.Therefore,effective methods are needed to accurately transfer knowledge in few-shot situations and improve the generalization ability of classification models.To tackle these challenges,a novel framework was proposed that integrates contrastive learning and meta-learning.Within the framework,contrastive learning was utilized to extract general language-agnostic semantic information,while the rapid generalization advantages of meta-learning were leveraged to improve knowledge transfer in few-shot set-tings.Furthermore,a task-based data augmentation method was proposed to further improve the performance of the framework in few-shot cross-lingual classification.Extensive experiments conducted on two widely used multilin-gual text classification datasets show that the proposed method outperforms several strong baselines.This indicates that the method can be effectively applied in the field of risk control and security.

cross-lingual text classificationmeta-learningcontrastive learningfew-shot

郭建铭、赵彧然、刘功申

展开 >

上海交通大学网络空间安全学院,上海 200240

跨语言文本分类 元学习 对比学习 小样本

国家自然科学基金项目上海市科技计划项目

U21B202022511104400

2024

网络与信息安全学报
人民邮电出版社

网络与信息安全学报

CSTPCD
ISSN:2096-109X
年,卷(期):2024.10(3)