一种支持动态页面的分布式爬虫系统设计与实现

Design and implementation of a distributed crawler system supporting dynamic pages

林永意 ¹卜言彬¹

扫码查看

作者信息

1. 南京传媒学院,江苏南京 211172
折叠

摘要

针对互联网大数据时代背景下,信息的爆炸式增长使得人们难以快速准确地获取有效信息的难题,文章设计并实现了一种支持动态页面的分布式爬虫系统.该系统以Scarpy-Redis分布式爬虫框架为基础,结合了Selenium和PostgreSQL数据库等相关技术.该系统可以从大量的动态或者静态网页,分布式地获取所需要的信息,存入数据库以供用户使用.

Abstract

The article designs and implements a distributed crawler system that supports dynamic pages to address the challenges of the explosive growth of information in the context of the Internet big data era,which makes it difficult for people to quickly and accurately obtain effective information.The system is based on the Scarpy-Redis distributed crawler framework and combines related technologies such as Selenium and PostgreSQL databases.This system can obtain the required information from a large number of dynamic or static web pages in a distributed manner and store it in a database for users to use.

关键词

分布式/爬虫/动态页面

Key words

distributed/reptile/dynamic page

引用本文复制引用

基金项目

江苏省高等教育学会专项课题支持项目(2022JDKT128)

出版年

2024

无线互联科技

江苏省科学技术情报研究所

无线互联科技

影响因子：0.263

ISSN：1672-6944

参考文献量6

段落导航