A Methodological and Empirical Study of Extracting Event Information in Textual Historical Materials Based on Conditional Random Fields:Taking the Digital Humanities Study of the Rabe's Diary as an Example
Textual histories are widely digitized.How to extract geographically named entities and related information from the texts and how to effectively realize geographic information mining have become an important research topic.This paper proposes an idea of extracting event elements related to time,place,persons,things,events and phenomena associated with geographically named entities by taking the geographically named entities as the core and making the semantic information associated with geographical locations,and by converting the event information described in the text into the attribute data of each geographically named entity.The study used the document Japanese Soldiers'Atrocities in the Nanking Safety Zone included in Rabe's Diary as an empirical case,and used the conditional random field method to extract events.Combined with historical maps and other related data,geographical information is finally mapped to the map.The methodology of this paper expands the way textual information is exploited in the digital information era,opening up new ideas for text mining analysis and knowledge discovery.
Conditional random fieldFeature templatesDigital humanitiesInformation extractionGeographically named entities