3DRes-ViT knee osteoarthritis classification model based on multimodal fusion
Aiming at the problems of low accuracy of multiple classification in Knee osteoarthritis(KOA)and insufficient feature extraction of knee joint images,the 3DRes-ViT network model based on multi-modal fusion was proposed in this paper.Firstly,the 3D Convolutional Neural Networks(3D CNN)is designed to extract the 3D shallow features of the two magnetic resonance imaging(MRI)sequences re-spectively,including dual echo steady state(DESS)and fast spin echo(TSE).The study found that the two kinds of information are complementary,and then these features are fused.Secondly,the dependen-cies among the fused feature channels are captured by the Efficient Channel Attention(ECA)module and fed into the Vision Transformer(ViT)encoders,which combines the advantages of 3DCNN and ViT to efficiently aggregate the local and global features of the two modalities.Finally,the output of ViT is then fused with the X-ray image features extracted by the 2D convolutional neural network(2D CNN)to fur-ther enhance the classification performance.Experimental results show that our method performs excellent-ly in the KOA four-classification task,with an average classification accuracy of 91.2%,an average preci-sion of 91.6%,an F1 score of 0.914,and a reduction of the average absolute error to 8.8%.The pro-posed model surpasses the mainstream methods in the current field and significantly improves the multiple classification accuracy of knee osteoarthritis.
medical image processingdeep learningknee osteoarthritismulti-modal fusionX-raymagnetic resonance imagingvision transformer