Deep-level Mining Methods for Multi-source Heterogeneous Data in Software Development Information Repository
Due to the fact that software development process involves the collaboration of multiple teams and individuals in the software development process,there are some problems such as inconsisten-cies,errors or omissions among documents.If these issues are not discovered and addressed promptly,they can significantly impact the efficiency and quality of software development.To address this,a deep-level mining method for multi-source heterogeneous data in software development information repository is proposed to accurately obtain the necessary data,improve software developers'work efficiency and soft-ware development speed.The method involves handling missing values and noise data in the multi-source heterogeneous data from different sources based on time series analysis.The processed multi-source het-erogeneous data features are extracted and used as inputs for clustering using a self-organizing feature mapping neural network(SOM neural network).Additionally,the ATPRK method is utilized to predict the requirements for the multi-source heterogeneous data in the software information repository.Based on this prediction,the SOM network clusters are recalculated to obtain the clustering results,achieving deep-level mining of multi-source heterogeneous data.Experimental results indicate that this method can mine 99%of the multi-source heterogeneous data in the software development information reposito-ry,effectively remove unnecessary data,achieve the best clustering accuracy when the number of clusters is 16,with a minimum clustering entropy value of only 0.31,demonstrating good performance in deep-level mining of data.