Research on Repeatability Detection Methods for Scientific Research Projects in Survey and Design Enterprises Based on BM25
The increasing prominence of redundant research investment in survey and design enterprises of China leads to a depletion of funds,human resources,reputation,and even the spirit of scientific research,which is detrimental to the incubation and development of cutting-edge technologies.Hence,it is imperative to automatically identify the redundancy of scientific research topics and maximize the reuse of scientific research outcomes through intelligent means.This paper proposes a method for detecting the redundancy of scientific research projects within enterprises,integrating the basic theory of the BM25 algorithm and combining the data attributes of survey and design enterprises with characteristic values such as domain,specialty,and project leaders.The method involves four steps:text pre-processing,establishing a matching library,calculating the similarity between the input topic and the topics in the matching library by using the TF-IDF algorithm and the BM25 algorithm respectively,and finally analyzing the calculation results.Compared with the TF-IDF algorithm,the BM25 algorithm realizes weight control through word saturation and field length specification,which demonstrates a distinct advantage in differentiation in the research on new energy,engineering digitalization,and informatization.It is more useful to mine texts with high similarity in different fields and avoid the omission of potential duplicate topics to the greatest extent.In the meantime,with a computation time of less than 0.1 seconds,it meets commercial needs,and supports the verification of redundancy in research topic initiation and the determination of overlap in outcomes.The accuracy of the calculation results has been verified by technical research and development personnel,meeting the needs of business management and holds promotional value in the survey and design industry.
scientific research projectproject redundancy verificationsurvey and design enterprisesBM25TF-IDFtext similarity