Four Diagnostic Description Extraction in Clinical Records of Traditional Chinese Medicine with Batch Data Oversampling
Four diagnostic description extraction in clinical records has clinical application in improving the practice of traditional Chinese medicine.As the first exploration of this extraction task,we firstly construct a clinical four diag-nostic description extraction corpus and then fine-tune a general domain pre-trained language model based on unla-beled clinical records of traditional Chinese medicine.We train the proposed four diagnostic description extraction model by utilizing a small labeled dataset through a well-designed batch data oversampling algorithm.The experi-mental results show that the performance of the proposed method in this paper is better than that of the compared methods,with an average improvement of the rare classes by 2.13%F1 score.
clinical records of traditional Chinese medicinefour diagnostic description extractionimbalanced class distributionbatch data oversampling