Large capacity semi structured data extraction algorithm combining machine learning and deep learning
Due to the high heterogeneity of semi-structured data and the huge amount of data,the data structure of different sources is inconsistent,resulting in low accuracy and integrity of data extraction.To this end,machine learning and deep learning are deeply integrated,and an extraction algorithm for large-capacity semi-structured data is proposed.By using the principal component analysis method of machine learning,the dimensionality of large volume semi-structured data is reduced.The converter network structure based on deep learning improves the embedding layer,encoding layer-decoding layer and encoding layer respectively,and obtains two kinds of data extraction algorithms for identifying the named entity of data and extracting the relationship of data entity,so as to realize the extraction of large-capacity semi-structured data.The test results verify that the proposed algorithm has a significant effect on correct extraction,the minimum extraction amount of invalid data items is only 4,the extraction complexity is low,and the aging value is high.The ablation experiment results of F-value and extraction time fully prove that the fusion of the two technologies is of great significance to the research of data extraction,and the F-value is always kept above 92,and the extraction time is shortened to 125 ms.It has strong feasibility and provides an important means for improving operational efficiency and optimizing resource allocation.