Research on High-Performance Distributed Computing Framework for Structured Article-Level Document Data
[Research purpose]To solve the problems of long R&D cycles and high technology barriers in mainstream distributed compu-ting frameworks such as MapReduce and Spark,a high performance computing framework ArticleCF with high flexibility and low barriers is proposed.[Research method]The ArticleCF framework absorbs the advantages of the mainstream distributed technology,at the same time,it deeply combines the characteristics of data governance of scientific literature,designs the software architecture of Master/Slave,according to the characteristics of scientific and technical literature data,multi-dimensional design is made in function,with emphasis on distributed task distribution strategy,parallel computing strategy and failover mechanism.[Research conclusion]By comparing ArticleCF with MapReduce,Spark and Storm through 21 indexes,the feasibility and validity of the proposed method are verified.ArticleCF meets the diverse processing needs of large amount of structured scientific and technical literature data.