This article aims to optimize web crawler algorithms by using Big Data and Deep Learning technology to better meet the needs of information collection and processing.Firstly,it uses Big Data technology for data col-lection.Then,the Term Frequency-Inverse Document Frequency(TF-IDF)weight is introduced as the initial weight of the input feature,and the Propagation Activation algorithm is used to optimize the crawler algorithm.Fi-nally,it integrates multimodal information.In order to test the application effect of Deep Learning web crawler al-gorithms based on Big Data in information collection and processing,this article compared them with traditional methods.Through experiments,it was found that the coverage of the proposed method can reach 92.9%when the number of Uniform Resource Locators(URL)is 10000,while the coverage of traditional methods is only 73.7%.Research has shown that the Deep Learning web crawler algorithm based on Big Data proposed in this article has higher coverage and better accuracy in information collection.
关键词
网络爬虫算法/深度学习/信息收集和处理/大数据
Key words
Web crawler algorithm/Deep Learning/Information collection and processing/Big Data