Accurately identifying human and large livestock intruding within the perimeter is a key focus of intelligent video analysis technology for railway perimeter intrusion.It is of great significance for ensuring railway safety operations.However,existing object detection algorithms struggle to handle significant scale variations of intrusion objects in railway monitoring scenarios.Therefore,a Multiple Input Double Output Network(MIDO-Net)and a multi-scale feature perception algorithm based on adaptive weighted fusion are proposed.Firstly,the MIDO-Net extracts richer multi-scale feature information of image objects through its multi-level cascaded multiple input and double output network structure.Secondly,based on the multi-stage characteristics of the backbone network,the multi-level features are sampled up to unified resolution and then weighted using attention modules and adaptive parameters.Then,the features are input into the detection head to complete the recognition of railway perimeter intrusion.Finally,the algorithm is validated using the Visual Object Classes(VOC)public dataset and a self-made dataset of railway foreign object intrusion in multiple scenes and scales.The results show that the proposed multi-scale feature perception algorithm achieves a detection accuracy of 83.3%in the VOC public dataset and 91.1%in the dataset of railway foreign object intrusion in multiple scenes and scales.The average recall rate is 56.2%,which is superior to various widely used feature extraction backbone networks.The algorithm detection rate is 45 frames per second(fps),surpassing similar backbone networks,and can meet the requirements for pedestrian real-time monitoring in railway scenarios.