面对网络中日益增多的数字作品以及人们版权意识的增强,确认数字作品版权归属非常重要,对于数字作品原创性检测问题,文本匹配技术能够很好地解决这一问题.文本匹配技术通过算法来判断句子之间的语义是否相近.最近几年,深度学习迅速发展,解决文本匹配任务的方法也得到了很好的发展.在已有的基于核的文档排序神经模型(a kernel based neural model for document ranking,KNRM)上进一步地研究和创新,提出融合KNRM和轻量级梯度提升机(light gradient boosting machine,LightGBM)算法的文本匹配模型,在交互矩阵转化的直方图上采用kernel-pooling的方式来提取相关局部特征信息,引入K个不同大小的核函数,来捕捉不同细粒度的相关匹配信号,获取高斯核特征,将LightGBM算法作为分类器,进行分类处理工作,预测最后的匹配结果.通过多个数据集验证模型效果,实验表明,融合模型KNRM-LightGBM在准确率方面优于原模型KNRM,能够达到更好的文本匹配效果.
Research on text matching model based on copyright authentication
In the face of the increasing number of digital works on the Internet and the enhancement of people's awareness of copyright,it is very important to confirm the ownership of the digital works copyright.Text matching technology can solve the problem of originality detection of digital works.Text matching technology uses the algorithms to determine whether sentences are semantically similar.In recent years,the deep learning has developed rapidly,as to have methods for solving text matching tasks.Based on the existing a kernel based neural model for document ranking(KNRM),a text matching model combining KNRM and LightGBM algorithm is proposed.The kernel-pooling method is adopted to extract relevant local feature information on the histogram of interaction matrix transformation.K kernel functions of different sizes are introduced to capture the correlation matching signals of different fine granularity and obtain the Gaussian kernel features.LightGBM algorithm is used as a classifier to classify and predict the final matching results.Validate model effects across multiple data sets.Experiments show that the fusion model KNRM-LighTGBM is superior to the original model KNRM in terms of accuracy,and can achieve better text matching effect.
text matchinga kernel based neural model for document rankinglight gradient boosting machinedigital copyright