基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用

Application of Deep Learning Web Crawler Algorithms Based on Big Data in Information Collection and Processing

于平¹

扫码查看

作者信息

1. 广州华南商贸职业学院广东广州 510650
折叠

摘要

旨在利用大数据和深度学习技术优化网络爬虫算法,以更好地满足信息搜集与处理的需求.首先,使用大数据技术进行数据收集;其次,引入词频反转文档频率(Term Frequency-Inverse Document Frequency,TF-IDF)权重作为输入特征的初始权重,并利用传播激活算法来优化爬虫算法;最后,对多模态信息进行整合.为了测试基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用效果,将其与传统方法进行了比较.通过实验发现,在统一资源定位器(Uniform Resource Locator,URL)数量为10 000时,提出的方法的覆盖率可达92.9%,而传统方法的覆盖率仅为73.7%.研究表明:所提出的基于大数据的深度学习网络爬虫算法在信息收集方面具有更高的覆盖率和更好的准确性.

Abstract

This article aims to optimize web crawler algorithms by using Big Data and Deep Learning technology to better meet the needs of information collection and processing.Firstly,it uses Big Data technology for data col-lection.Then,the Term Frequency-Inverse Document Frequency(TF-IDF)weight is introduced as the initial weight of the input feature,and the Propagation Activation algorithm is used to optimize the crawler algorithm.Fi-nally,it integrates multimodal information.In order to test the application effect of Deep Learning web crawler al-gorithms based on Big Data in information collection and processing,this article compared them with traditional methods.Through experiments,it was found that the coverage of the proposed method can reach 92.9%when the number of Uniform Resource Locators(URL)is 10000,while the coverage of traditional methods is only 73.7%.Research has shown that the Deep Learning web crawler algorithm based on Big Data proposed in this article has higher coverage and better accuracy in information collection.

关键词

网络爬虫算法/深度学习/信息收集和处理/大数据

Key words

Web crawler algorithm/Deep Learning/Information collection and processing/Big Data

引用本文复制引用

出版年

2024

科技资讯

北京国际科技服务中心北京合作创新国际科技服务中心

科技资讯

影响因子：0.51

ISSN：1672-3791

段落导航