基于SVM的动态网页爬取方法研究

Research on Dynamic Web Page Crawling Method Based on SVM

刘君良 ¹栾永明 ¹赵建楠 ¹任川¹

扫码查看

作者信息

1. 辽宁省气象信息中心,辽宁沈阳 110000
折叠

摘要

文章提出一种基于支持向量机(Support Vector Machine,SVM)的动态网页识别方法,并结合Scrapy开源网络爬虫框架构建了动态网页的网络爬虫,实现了对动态网页的高效识别和内容抓取.以httpbin.org为测试网站,使用SVM模型对静态和动态网页进行分类,随后利用Scrapy框架动态调整抓取策略,验证了该方法的可行性和有效性.

Abstract

This paper proposes a dynamic web page recognition method based on Support Vector Machine(SVM),and combines it with the Scrapy open source web crawler framework to build a web crawler for dynamic web pages,achieving efficient recognition and content capture of dynamic web pages.This paper uses httpbin.org as the test website,uses the SVM model to classify static and dynamic web pages,and then uses the Scrapy framework to dynamically adjust the crawling strategy to verify the feasibility and effectiveness of this method.

关键词

支持向量机(SVM)/动态网页识别/Scrapy框架/网络爬虫

Key words

Support Vector Machine(SVM)/dynamic web page recognition/Scrapy framework/web crawler

引用本文复制引用

出版年

2024

信息与电脑

北京电子控股有限责任公司

信息与电脑

影响因子：1.143

ISSN：1003-9767

参考文献量11

段落导航