A Study on Reading Comprehension Dataset of Tibetan Medicine Extractive Machine
The field of Tibetan machine reading comprehension is still in its infancy,and the construction of a high-quality corpus has become an urgent task to promote the development of this field.This study adopted a crowdsourc-ing approach to finely annotate the Tibetan medical compilation and terminology explanations in the Tibetan medical classics,the"The Four Medical Tantras."Combined with the Tibetan masked data enrichment strategy,the scale of the dataset was effectively expanded,and finally 13,000 effective question-answer pairs were sorted out.Based on the dataset,an efficient model of Tibetan machine reading comprehension is proposed by optimizing the traditional atten-tion mechanism.The research in this paper is not only of great significance for promoting the in-depth development of Tibetan information processing technology,but also helps to improvethe ability of machines to understand Tibetan texts,so as to provide strong support for the inheritance and protection of Tibetan culture.
Tibetan machine reading comprehensionThe Four Medical TantrasTibetan corpusAttention mechanism