Construction of Question Taxonomy and An Annotated Chinese Corpus for Diabetes Question Classification
As a typical chronic disease,diabetes has become one of the major global public health challenges.The au-tomated diabetes Question Answering(QA)services plays a vital role in providing daily health services for patients and high-risk people.This paper designed a new diabetes question classification taxonomy which represents the user intent,including 6 coarse-grained categories and 23 fine-grained categories.This paper also constructed a new Chi-nese diabetes QA corpus DaCorp that contains 122,732 questions-answer pairs,collected from two professional medical QA websites.Meanwhile,this paper annotated 8,000 diabetes questions in DaCorp as a fine-grained diabetes dataset.To evaluate the quality of the proposed taxonomy and the annotated dataset,this paper implemented 8 ma-instream baseline classifiers for diabetes question classification.Results show that the best-performing model gained an accuracy of 88.7%,demonstrating the validity of the annotated diabetes dataset and the efficacy of the proposed taxonomy.
diabetesquestion classificationclassification taxonomycorpus construction