Adaptive label information learning for intention detection
马坤 1刘筱云 1李乐平 1纪科 1陈贞翔 1杨波1
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 济南大学信息科学与工程学院,山东 济南 250022
折叠
摘要
为解决多标签文本分类在捕获标签关系时忽视标签共现特性的问题,提出基于统计特征的自适应多标签信息学习方法(adaptive label feature learning,ALFL),用于检测内容营销文章.构建主题先验自适应标记狄利克雷主题模型(labeled latent dirichlet allocation with adaptive topic priors,LDATP),根据每个文本的标签集合情况,与标签集合对应的全部营销主题约束模型生成主题词概率分布;构建标签信息整合网络(label information integration network,LIIN),利用主题词概率分布和标签的图结构学习标签相关信息,获得标签嵌入表示;进行文本和标签空间之间的信息交互,捕获语义特征以识别营销文章.试验结果表明,基于统计特征的ALFL方法以召回率为 80.92%、准确率为 88.14%,优于其他基线模型,具有更高的预测准确性.
Abstract
In order to solve the problem of ignoring label co-occurrence characteristics when capturing label relationships in multi-label text classification,an adaptive label feature learning(ALFL)method based on statistical features was proposed for detecting content marketing articles.Based on the set of labels for each text,ALFL generated the topic-word probability distribution by labeled latent dirichlet allocation with adaptive topic priors(LDATP)that used all the marketing topics corresponding to the label set to constraint model;ALFL constructed the label information integration network(LIIN),used the topic-word probability distribution and label graph structure to learn the label related information,obtained the label embedded representation;it conducted information interaction between text and label space,capturing more semantic features to identify marketing articles.The experimental results showed that the ALFL method based on statistical features outperformed other baseline models with a recall rate of 80.92%and an accuracy rate of 88.14%,had higher prediction accuracy.
关键词
多标签文本分类/标签共现/主题模型/图结构/标签嵌入
Key words
multi-label text classification/label co-occurrence/topic model/graph structure/label embedding