中国民族语言大规模标注文本的检索技术实现及其价值

The Implementation of Retrieval Technology for Large-scale Annotated Texts in Ethnic Languages in China and its Significance

江荻 龙从军

中国民族语言大规模标注文本的检索技术实现及其价值

The Implementation of Retrieval Technology for Large-scale Annotated Texts in Ethnic Languages in China and its Significance

江荻 1龙从军2
扫码查看

作者信息

  • 1. 江苏师范大学汉语和汉藏语研究中心,江苏徐州 221116;中国社会科学院民族学与人类学研究所,北京 100081
  • 2. 中国社会科学院民族学与人类学研究所,北京 100081;中国社会科学院大学文学院,北京 100081
  • 折叠

摘要

《中国民族语言语法标注文本》丛书是国内第一套大规模真实文本资源,涵盖十余种低资源中国民族语言,又具有语法标注精深学术价值,因此引起学界广泛的兴趣和关注.鉴于该套丛书大规模标注文检索技术实现的重要价值,本文通过介绍该项目的内容、技术实现过程和可期的检索功能,特别对国际通行隔行对照化对齐文本的实现技术加以详释,使读者在项目上线之前就对丛书电子化和检索技术的实现有客观而清晰的认识.

Abstract

The series of Grammatically Labeled Texts of Ethnic Languages in China is the first set of large-scale authentic text resources in China,covering more than ten low-resource ethnic languages in China with much academic value of grammatical labeling,and thus arousing wide interest and con-cern in the academic circles.In view of its great value of the successful implementation of retrieval technology for large-scale annotated texts,this paper summarizes the content of the project,the process of the technical realization and the expected searching functions,and especially expounds the implementation technology for the internationally popular interlinearized texts so that readers can clearly understand the implementation of electronic and retrieval technology for book series before the project goes on line.

关键词

民族语/标注文本/语料数据/检索技术

Key words

ethnic languages/annotated texts/corpus data/retrieval technology

引用本文复制引用

基金项目

国家社会科学基金重大项目(21&ZD304)

出版年

2023
云南师范大学学报(哲学社会科学版)
云南师范大学

云南师范大学学报(哲学社会科学版)

CSSCICHSSCD北大核心
影响因子:1.025
ISSN:1000-5110
参考文献量3
段落导航相关论文