周光召
月刊
1674-733X
informatics@scichina.org
010-64015683
100717
北京东黄城根北街16号
multimodal modelopen-sourcevision encoderdynamic resolutionbilingual dataset
large multimodal modelOCRtext recognitionscene text-centric VQAdocument-oriented VQAkey information extractionhandwritten mathematical expression recognition
instruction tuningmulti-modalmulti-domaindatasetvision large language model
multimodal learningmultimodal large language modelshallucination correctionlarge lan-guage modelsvision and language
document understandinglarge multimodal modelOCR-freehigh-resolutionfrequency
large multimodal modelmultimodal learningvision-language pretrainingparameter-efficient fine-tuningadaptermodality expert
multiple antennasmmWave/terahertz wavemachine learningresource allocationsXR
GPU clusterdeep learning workloadcluster managementGPU sharingdeployed system