首页|基于条件随机场挖掘文本史料中事件信息的方法与实证研究——以《拉贝日记》数字人文研究为例

基于条件随机场挖掘文本史料中事件信息的方法与实证研究——以《拉贝日记》数字人文研究为例

A Methodological and Empirical Study of Extracting Event Information in Textual Historical Materials Based on Conditional Random Fields:Taking the Digital Humanities Study of the Rabe's Diary as an Example

扫码查看
文本史料被广泛数字化,如何从文本中提取地理命名实体及相关信息,有效开展地理信息挖掘成为重要研究课题.本文针对历史档案文档的特点,提出一种以地理命名实体为核心,使语义信息与地理位置关联,将文本描述的事件信息转化为各个地理命名实体的属性数据的事件抽取理念,提取出有关时间、地点、人物、事物、事件、现象等与地理命名实体相关的事件要素.研究以《拉贝日记》中收录的《日本士兵在南京安全区的暴行》档案为实证案例,采用条件随机场方法,抽取事件信息,结合历史地图等相关资料,将地理信息最终映射到地图上.本文方法有助于拓展文本资料在数字信息时代的开发利用方式,开辟文本挖掘分析与知识发现的新思路.
Textual histories are widely digitized.How to extract geographically named entities and related information from the texts and how to effectively realize geographic information mining have become an important research topic.This paper proposes an idea of extracting event elements related to time,place,persons,things,events and phenomena associated with geographically named entities by taking the geographically named entities as the core and making the semantic information associated with geographical locations,and by converting the event information described in the text into the attribute data of each geographically named entity.The study used the document Japanese Soldiers'Atrocities in the Nanking Safety Zone included in Rabe's Diary as an empirical case,and used the conditional random field method to extract events.Combined with historical maps and other related data,geographical information is finally mapped to the map.The methodology of this paper expands the way textual information is exploited in the digital information era,opening up new ideas for text mining analysis and knowledge discovery.

Conditional random fieldFeature templatesDigital humanitiesInformation extractionGeographically named entities

赵小萱、陈刚、黄紫荆

展开 >

南京大学地理与海洋科学学院

江苏省地理信息技术重点实验室

自然资源部国土卫星遥感应用重点实验室

条件随机场 特征模板 数字人文 信息提取 地理命名实体

国家自然科学基金南京大学"双创"项目(2021)

42071172

2024

图书馆杂志
上海市图书馆学会 上海图书馆

图书馆杂志

CSSCICHSSCD北大核心
影响因子:1.475
ISSN:1000-4254
年,卷(期):2024.43(3)
  • 33