Study on the Construction of a Question-Answer Corpus Dataset for Chinese Medical Knowledge Large Language Models
Purpose/Significance To construct a Chinese medical knowledge Q&A corpus dataset as a standardized evaluation bench-mark for large language models(LLMs)in the medical domain,so as to improve the accuracy and efficiency of LLMs in handling Chinese medical questions.Method/Process Chinese medical paper knowledge,medical terminology explanations and supplementary questions are acquired from the Chinese medical licensing examination,and open-source Chinese medical Q&A datasets are encompassed in the developed Q&A datasets.Result/Conclusion The Chinese medical knowledge Q&A corpus datasets enrich the sources of existing datasets and promote the objective and comprehensive quantitative evaluation of large models in the medical field.In the near future,additional data such as electronic medical records and those from online health communities will be used to strengthen the support of artificial intelli-gence for the Healthy China strategy.
large language modelscorpus datasetmodel evaluationmedicine