学术全文本中包含了多种知识元,对这些知识元进行挖掘与组织,可以有效提升学术资源的利用效率.通过学术知识图谱的构建,将论文中各类隐性"知识元"串联起来,不但可以节省研究者获取知识点的时间,还可以通过知识图谱内的网络社区进行知识点的扩充.通过系统而全面的文献调研,本文从宏观、中观和微观3个维度出发,确定了18种学术论文中的关键知识元,并将学术全文本中的描述信息作为知识元对象,设计出学术知识图谱概念框架.然后,选取Journal of the Association for Information Science and Technology(JASIST)期刊的515篇学术全文本,对每篇论文中的关键知识元进行人工标注与基于深度学习的知识元抽取研究.研究内容包括该类知识元在人工标注过程中是否会遇到问题、在自动抽取时是否会达到预期值,从而对参与图谱构建的知识元进行筛选.最终筛选出9种知识元,包括数学公式、软件工具、数据源、具体模型、表、图、研究展望、研究问题和研究结果,与题录数据中的知识元共同生成由头知识元、关系、尾知识元组成的三元组,存入图数据库.最后,对该图谱进行可视化的评估与知识元检索研究,证明了其可行性与可扩展性.研究结果表明,学术全文本中的部分知识元适合大规模的自动化标注,而且各类知识元可以通过互相链接形成密集的知识社区,并具备知识元搜索等功能.
Research on Constructing an Academic Knowledge Graph of Multi-dimensional Knowledge Elements in Academic Full Texts
Academic texts contain a large amount of knowledge element information.Mining and organizing these knowl-edge elements can effectively improve the utilization efficiency of academic resources.Through the construction of an aca-demic knowledge graph,connecting all kinds of tacit"knowledge elements"in an article can not only save time for schol-ars seeking to obtain knowledge points but also help them expand knowledge points through the network community in a knowledge graph.Through literature research and other methods,beginning with three dimensions,this paper determines the key knowledge elements in 18 academic papers,takes the text description information of knowledge elements as the en-tity object,and outlines the conceptual framework of an academic knowledge graph.Then,515 pieces of literature in the JASIST are selected to study the manual annotation and entity extraction of the key knowledge elements in each paper based on deep learning.The research content includes whether such knowledge elements will create problems in the pro-cess of manual annotation and whether they will reach the expected value in automatic extraction when attempting to screen the knowledge elements involved in the construction of a knowledge graph.Finally,nine kinds of knowledge ele-ments are selected,including mathematical formulas,software tools,data sources,specific models,tables,graphs,research prospects,research problems,and research results.Together with the titular data,triads composed of head entities,rela-tions,and tail entities are generated and stored in the graph database for visual evaluation.Finally,the visualization and knowledge element retrieval of the graph are studied to prove its feasibility and scalability.The research shows that some knowledge elements in the text are suitable for large-scale automatic annotation,and all kinds of knowledge elements can form a dense knowledge community through mutual links.
knowledge metaknowledge graphacademic full textdeep learning