首页|结合文本聚类和多标签分类的学科交叉主题早期识别方法

结合文本聚类和多标签分类的学科交叉主题早期识别方法

扫码查看
[研究目的]以专利为研究数据,提出一种结合文本聚类和多标签分类的学科交叉主题早期识别方法.[研究方法]以"量子计算"作为研究领域,通过基于聚类结果的筛选和基于多标签分类的筛选等两种方法将大量非学科交叉专利从专利集合中过滤,进而在学科交叉专利占比较高的小数据集上采用主题识别方法实现学科交叉主题的早期识别.随后,在德温特专利数据集上进行实证研究,验证了所提出方法的有效性.[研究结论]研究发现了"量子加密技术"和"量子计算技术与量子计算机"等学科交叉主题.与已有方法相比,提出的识别方法可以在交叉领域尚处于萌芽期或成长期、相关文献数量较少的情况下,发现文献集合中的学科交叉主题.
Early Identification Method of Interdisciplinary Topics based on Text Clustering and Multi-Label Classification
[Research purpose]Using patents as research data,this paper proposes an early identification method for interdisciplinary top-ics by combining text clustering and multi-label classification.[Research method]Taking"quantum computing"as the research field,a large number of non-interdisciplinary patents are filtered out from the patent collection through two methods:selection based on clustering results and selection based on multi-label classification.Then,the topic identification method is adopted on a small dataset with a high proportion of interdisciplinary patents to achieve early identification of interdisciplinary topics.Subsequently,empirical research is conduc-ted on the Derwent patents to verify the effectiveness of the proposed method.[Research conclusion]Some interdisciplinary topics such as"quantum encryption technology"and"quantum computing technology and quantum computers"are found.Compared with existing methods,the method can discover interdisciplinary topics in the literature when the interdisciplinary field is still in its embryonic or growth stage and the number of relevant literature is small.

patent datainterdisciplinary topicearly identificationmulti-label classificationinterdisciplinary patenttext clusteringquantum computing

冯岭

展开 >

华北水利水电大学信息工程学院 郑州 450046

中国科学技术信息研究所科学计量与评价研究中心 北京 100038

专利数据 学科交叉主题 早期识别 多标签分类 学科交叉专利 文本聚类 量子计算

河南省软科学研究计划项目科技大数据湖北重点实验室开放基金项目

222400410445

2024

情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
年,卷(期):2024.43(8)
  • 9