网络空间安全2024,Vol.15Issue(1) :70-75.

基于Scrapy和Elasticsesarch的网站敏感词检测系统

Website sensitive word detection system based on scrapy and elasticsearch

郭向民 袁许龙 朱洛凌
网络空间安全2024,Vol.15Issue(1) :70-75.

基于Scrapy和Elasticsesarch的网站敏感词检测系统

Website sensitive word detection system based on scrapy and elasticsearch

郭向民 1袁许龙 2朱洛凌1
扫码查看

作者信息

  • 1. 江苏警官学院,江苏南京 210031
  • 2. 无锡市公安局,江苏无锡 214000
  • 折叠

摘要

[目的/意义]随着互联网信息的爆发式增长,网页中出现的敏感词容易引发社会争议与冲突.在网络空间治理过程中,迅速处理和反馈至关重要.然而,传统的人工审核网站内容的方式已经无法满足需求.因此,自动化的敏感词检测系统成为有效减少敏感信息传播、维护网络空间稳定和安全的一种重要工具.[方法/过程]设计并实现了一种基于Scrapy和Elasticsearch的敏感词检测系统,采用Scrapy爬虫获取特定网站内容,使用Elasticsearch存储网页内容,并借助提供的中文分词、倒排索引和全文检索技术,实现对网页内容中敏感词的检测,采用流行的前后端框架Vue+Flask完成整个系统的开发.[结果/结论]系统支持用户自定义敏感词列表,可以实现对特定网站内容的定时爬取和定时检测,并将检测出敏感词后通过电子邮件通知网站管理员,从而有效地加强了对网站的管理能力.

Abstract

[Purpose/Significance]With the explosive growth of internet information,the appearance of sensitive words on web pages easily triggers social disputes and conflicts.In the process of cyberspace governance,prompt handling and feedback are crucial.However,the traditional manual review of website content is no longer able to meet the demands.Therefore,an automated sensitive word detection system has become an essential tool for effectively reducing the dissemination of sensitive information and maintaining the stability and security of cyberspace.[Method/Process]This article designs and implements a sensitive word detection system based on Scrapy and Elasticsearch.It uses Scrapy crawler to obtain specific website content and uses Elasticsearch to store web page content.Leveraging its Chinese word segmentation,inverted index,and full-text search capabilities,the system achieves sensitive word detection within webpage content.The development of the entire system was completed using the popular Vue+Flask front-end and back-end frameworks.[Results/Conclusion]The experimental results show that this system supports user-defined sensitive word lists,can regularly crawl and detect specific website content,and notifies the website administrator via email after detecting sensitive words,thus effectively strengthening Website management capabilities.

关键词

Scrapy/Elasticsearch/敏感词检测/网络空间治理/网络安全

Key words

scrapy/elasticsearch/sensitive words detection/cyberspace governance/network security

引用本文复制引用

出版年

2024
网络空间安全
中国电子信息产业发展研究院

网络空间安全

影响因子:0.505
ISSN:1674-9456
被引量1
参考文献量11
段落导航相关论文