摘要
文章分析讨论了半结构化信息管理技术的发展状况和应用情况,在梳理和总结半结构化文本信息抽取载体类型、内容和技术方法的基础上,设计了科创项目信息提取系统.该系统数据源以科研院所/创业团队提供的商业策划书为主,采用B/S架构,以基础设置、数据层、应用层和用户层四层逻辑构架为基础,通过业务逻辑后台、文件解析模块、项目关键信息抽取服务三大功能模块,实现对科创项目策划书文本数据采集、关键信息提取、数据存储以及数据服务的高效管理.实践结果表明,该系统功能达到了预期设计目标,运行稳定、高效.
Abstract
This paper analyzes and discusses the development and application of semi-structured information management technology.Based on sorting and summarizing the types,contents,and technical methods of semi-structured text information extraction carriers,an information extraction system for science and technology innovation project is designed.The data source of this system is mainly business proposals provided by research institutes/entrepreneurial teams,using a B/S architecture.It is based on a four layer logical framework of basic settings,data layer,application layer,and user layer.Through three functional modules:business logic backend,file parsing module,and project key information extraction service,it achieves efficient management of text data collection,key information extraction,data storage,and data services for science and technology innovation project proposals.The practical results show that the system function has achieved the expected design goals,it operates stably and efficiently.