债券领域的多粒度词向量训练及评估方法研究

Study on Multi-granularity Embeddings of Training and Evaluation in the Bond Field

华娇娇 ¹唐华云 ²王延昭 ²商丽丽¹

扫码查看

作者信息

1. 中央国债登记结算有限责任公司博士后科研工作站,北京 100033
2. 中债金科信息技术有限公司区块链实验室,北京 100004
折叠

摘要

债券市场充斥着海量且复杂的信息,而构建能够表达债券市场复杂语义的数字词典(预训练词向量),是充分利用这些信息并实现金融科技赋能业务的关键.目前,不仅缺乏债券领域专用的预训练词向量,而且词向量的评估也是一大挑战.上述研究提出了一种联合字组件、字和词信息的的债券领域多粒度词向量训练框架(BondJWE).此外,上述研究为了实现对该词向量的科学评估,针对已有数据特点设计了下游文本分类任务.以上研究弥补了债券领域的专用预训练词向量研究的空白,且其实验结果表明BondJWE的性能优于其它基线模型,说明以上研究所提供的多粒度词向量有着更好的语义表达能力和鲁棒性.

Abstract

The bond market is flooded with massive and complex information,while the key to fully utilizing this information and implementing the aim that fintech enables businesses is to construct a digital dictionary(namely,pre-trained word embeddings),which can describe complex semantics in the bond market.So far,there has been a lack of pre-trained bond-specific embeddings,and their evaluation has also been a big challenge.On the basis of joint infor-mation of components,characters and words,this study proposed a multi-granularity word embeddings training frame-work for the bond field,named BondJWE.Moreover,to evaluate these embeddings scientifically,this study designed a downstream task,text classification,according to intrinsic features of data.This study makes up for the blank of re-search on pre-trained bond-specific embeddings.And results show that the performance of BondJWE is better than that of other baseline models,which indicates that these multi-granularity word embeddings can better express seman-tics and are more robust.

关键词

词向量/文本分类/债券

Key words

Word embeddings/Text classification/Bond

引用本文复制引用

基金项目

绿色发展大数据决策北京市重点实验室项目(dm202103)

中国博士后科学基金(2022M723692)

出版年

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

参考文献量28

段落导航