Four diagnostic description extraction in clinical records has clinical application in improving the practice of traditional Chinese medicine.As the first exploration of this extraction task,we firstly construct a clinical four diag-nostic description extraction corpus and then fine-tune a general domain pre-trained language model based on unla-beled clinical records of traditional Chinese medicine.We train the proposed four diagnostic description extraction model by utilizing a small labeled dataset through a well-designed batch data oversampling algorithm.The experi-mental results show that the performance of the proposed method in this paper is better than that of the compared methods,with an average improvement of the rare classes by 2.13%F1 score.
clinical records of traditional Chinese medicinefour diagnostic description extractionimbalanced class distributionbatch data oversampling