基于自然语言处理和机器学习的产业用地性质的识别
Identification of Industrial Land Usage Based on Natural Language Processing and Machine Learning
史晟恺1
作者信息
摘要
在产业用地信息梳理的过程中,首先需要确定地块的底数,其中图斑所在土地的用地性质是关键信息.由于一些早期的纸质信息对相关信息的关键字缺少明确标识,所以只能花费很多人力和时间去阅读这些纸质内容或者扫描文件中的数据,最后进行人工判断、总结.现基于自然语言处理和机器学习,通过引入重要词权重构建改进型朴素贝叶斯模型,对需要的土地信息进行识别,并和实际正确的信息进行比较.结果表明:通过机器学习对字典的构建后,运用自然语言处理技术对产业用地关键信息识别的准确度和效率有较大提升.
Abstract
In the process of combing industrial land information,it is first necessary to determine the base number of land plots,among which the land usage of the land where patter spots are located is key information.Due to the lack of the clear identification of the keywords of relevant information in some early paper information,a lot of man-power and time are only wasted to read these paper contents or scan the data in documents,and finally manual judgments and summaries are made.Now,based on natural language processing and machine learning,an improved naive Bayes model is constructed by introducing important word weights to identify the required land information and compare it with actual correct information.The results show that after constructing the dictionary through ma-chine learning,the use of natural language processing technology greatly improves the accuracy and efficiency of the key information recognition of industrial land.
关键词
图斑/用地性质/自然语言处理/机器学习Key words
Pattern spot/Land usage/Natural language processing/Machine learning引用本文复制引用
出版年
2024