基于Java的网络爬虫算法的实现

Implementation of Web Crawler Algorithm Based on Java

李晖¹

扫码查看

作者信息

1. 济源职业技术学院,河南济源 459000
折叠

摘要

该设计在实现多个抓取线程管理和利用线程池发送抓取网页的同时,采用非递归爬行算法,利用Java多线程技术和基于内存的作业队列来增加、分配和处理运行过程中的URL.搜索引擎简易客户端的设计最终采用JSP(Java Server Pages)技术完成.

Abstract

This design uses non recursive crawling algorithm,uses Java multithreading technology and memory based job queue to add,allocate and process URL in the operation process,and uses thread pool to realize the management of multiple fetching threads,and concurrently fetches web pages.Finally,JSP(Java Server Pages)technology is used to complete the design of simple search engine client.

关键词

网络爬虫/搜索引擎/JSP

Key words

web crawler/search engine/JSP

引用本文复制引用

出版年

2024

电脑与信息技术

中国电子学会,湖南省电子研究所

电脑与信息技术

影响因子：0.256

ISSN：1005-1228

段落导航