网页监控分布式爬虫
Distributed Web Monitoring Crawler
方宇浩 1倪胜巧1
作者信息
- 1. 四川大学计算机学院,成都 610065
- 折叠
摘要
互联网的飞速发展,改变人们获取信息的方式,互联网渐渐取代传统媒体,现在每天都有海量的信息在互联网上更新着,如今世界已经进入到以数据为中心的大数据时代.提出一种监控这些数据的分布式爬虫技术,以及一种基于网页结构的提取网页更新内容的算法.
Abstract
Internet is under rapid development and changing the way of obtaining information in life vividly,it is also on the way to replace traditional media.Countless information are updated on the Internet every day,which means the whole world has in fact stepped into another era,the era of big data.Proposes a distributed system crawler to monitor the data and an algorithm to extract Web updated content based on Web structure.
关键词
爬虫/分布式系统/消息队列Key words
Crawler/Distributed System/Message Queue引用本文复制引用
出版年
2015