首页|债券领域的多粒度词向量训练及评估方法研究

债券领域的多粒度词向量训练及评估方法研究

扫码查看
债券市场充斥着海量且复杂的信息,而构建能够表达债券市场复杂语义的数字词典(预训练词向量),是充分利用这些信息并实现金融科技赋能业务的关键。目前,不仅缺乏债券领域专用的预训练词向量,而且词向量的评估也是一大挑战。上述研究提出了一种联合字组件、字和词信息的的债券领域多粒度词向量训练框架(BondJWE)。此外,上述研究为了实现对该词向量的科学评估,针对已有数据特点设计了下游文本分类任务。以上研究弥补了债券领域的专用预训练词向量研究的空白,且其实验结果表明BondJWE的性能优于其它基线模型,说明以上研究所提供的多粒度词向量有着更好的语义表达能力和鲁棒性。
Study on Multi-granularity Embeddings of Training and Evaluation in the Bond Field
The bond market is flooded with massive and complex information,while the key to fully utilizing this information and implementing the aim that fintech enables businesses is to construct a digital dictionary(namely,pre-trained word embeddings),which can describe complex semantics in the bond market.So far,there has been a lack of pre-trained bond-specific embeddings,and their evaluation has also been a big challenge.On the basis of joint infor-mation of components,characters and words,this study proposed a multi-granularity word embeddings training frame-work for the bond field,named BondJWE.Moreover,to evaluate these embeddings scientifically,this study designed a downstream task,text classification,according to intrinsic features of data.This study makes up for the blank of re-search on pre-trained bond-specific embeddings.And results show that the performance of BondJWE is better than that of other baseline models,which indicates that these multi-granularity word embeddings can better express seman-tics and are more robust.

Word embeddingsText classificationBond

华娇娇、唐华云、王延昭、商丽丽

展开 >

中央国债登记结算有限责任公司博士后科研工作站,北京 100033

中债金科信息技术有限公司区块链实验室,北京 100004

词向量 文本分类 债券

绿色发展大数据决策北京市重点实验室项目中国博士后科学基金

dm2021032022M723692

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(3)
  • 28