首页|基于增强对比学习的多语言事件发现方法

基于增强对比学习的多语言事件发现方法

Multilingual event discovery based on augmentation contrast learning

扫码查看
多语言事件发现是把描述同一事件的多种语言文本聚类到同一个簇,是多语言事件分析的基础.目前基于深度学习的聚类方法主要通过优化文本表示之间的距离实现聚类,其性能严重依赖模型表示能力,多语环境下文本表示对齐效果不理想,多语言事件聚类难度大.文章提出一种基于增强对比学习的多语言事件发现方法,通过优化事件文本到簇心和多语言正负样本之间的距离,使同一事件的多语言文本在表示空间更加接近,提高模型对多语言文本的表示能力.针对事件聚类任务引入事件要素的表征作为事件聚类中心,进一步提升多语言事件聚类效果.在路透社数据集上的实验结果表明,提出的方法在多个预训练模型基础上性能均有提升,准确率和标准化互信息最优效果分别达到了 76.14%和 91.09%.
Multilingual event discovery is the clustering of multiple language texts that describe the same event into the same cluster,and it is the foundation of multilingual event analysis.Deep clustering methods based on optimizing the distance between text representations are used to achieve clustering,and their performance heavily depends on the model's representation ability.In a multilingual environment,text representation alignment is not ideal,which makes multilingual event clustering difficult.This paper proposes a multilingual event discovery method based on augmentation contrastive learning.This method optimizes the distance between event texts and the centroids of clusters,as well as the distance between multilingual positive and negative samples.This enhances the proximity of multilingual texts describing the same event in the representation space and improves the model's representation ability for multilingual texts.Additionally,the method introduces event features as the representation of event clustering centers,further improving the effectiveness of multilingual event clustering.Experimental results on the Reuters dataset show that the proposed method improves the performance of multiple pre-trained models,achieving the best accuracy and standardized mutual information of 76.14%and 91.09%,respectively.

multilingual event discoverydeep clusteringcontrastive learningdata augmentationevent elements

潘通、余正涛、黄于欣、关昕、严海宁、杨溪

展开 >

昆明理工大学信息工程与自动化学院,云南昆明 650500

云南省人工智能重点实验室,云南昆明 650500

多语言事件发现 深度聚类 对比学习 数据增强 事件要素

国家自然科学基金国家自然科学基金云南省基础研究专项重点项目

U21B202761972186202201AS070179

2024

云南大学学报(自然科学版)
云南大学

云南大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.663
ISSN:0258-7971
年,卷(期):2024.46(4)