基于SVM的动态网页爬取方法研究
Research on Dynamic Web Page Crawling Method Based on SVM
刘君良 1栾永明 1赵建楠 1任川1
作者信息
- 1. 辽宁省气象信息中心,辽宁沈阳 110000
- 折叠
摘要
文章提出一种基于支持向量机(Support Vector Machine,SVM)的动态网页识别方法,并结合Scrapy开源网络爬虫框架构建了动态网页的网络爬虫,实现了对动态网页的高效识别和内容抓取.以httpbin.org为测试网站,使用SVM模型对静态和动态网页进行分类,随后利用Scrapy框架动态调整抓取策略,验证了该方法的可行性和有效性.
Abstract
This paper proposes a dynamic web page recognition method based on Support Vector Machine(SVM),and combines it with the Scrapy open source web crawler framework to build a web crawler for dynamic web pages,achieving efficient recognition and content capture of dynamic web pages.This paper uses httpbin.org as the test website,uses the SVM model to classify static and dynamic web pages,and then uses the Scrapy framework to dynamically adjust the crawling strategy to verify the feasibility and effectiveness of this method.
关键词
支持向量机(SVM)/动态网页识别/Scrapy框架/网络爬虫Key words
Support Vector Machine(SVM)/dynamic web page recognition/Scrapy framework/web crawler引用本文复制引用
出版年
2024