For the classroom scenario, the detection effect of students' facial expression "multi-target"and "small target" is poor, and the phenomenon of misdetection and omission occurs. In this paper, we propose YOLOv5-SWIN, a classroom facial expression detection algorithm that improves YOLOv5. Firstly, we use the Swin Transformer as the backbone feature extraction network of the model to enhance the global information perception and further enhance the semantic information of the target. Secondly, we introduce the CBAM attention mechanism to be integrated into the feature extraction network in order to better improve the detection accuracy. Finally, by using the NWD loss function, the model effectively reduces the sensitivity to the detection of "small targets", thus improving the robustness of the model. Experiments are conducted on a large-scale dataset of students' facial expressions in a self-constructed classroom scenario, and the experimental results show that the method can quickly and accurately recognize students' facial expressions, and the accuracy of the improved model on the self-constructed dataset is increased by 4%, reaching 82.1%.