自适应IoU损失和层级关联的多目标跟踪

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的针对模糊行人特征造成身份切换的问题和复杂场景下目标之间遮挡造成跟踪精度降低的问题,提出AIoU-Tracker多目标跟踪算法.方法首先根据骨干网络检测头设计了一个特殊的AIoU(adaptive intersection over union)回归损失函数,从重叠面积、中心点距离和纵横比3个方面去衡量,缓解了由于模糊行人特征判别性不足造成的身份切换现象;其次提出了一种简单有效的层级(hierarchical)关联策略,在高分检测框和低分检测框分别关联之后,充分利用关联失败检测框周围的嵌入信息再次进行关联,提高了在遮挡条件下多目标跟踪的关联精度.结果通过一系列的对比实验,提出的AIoU-Tracker跟踪方法相比于FairMOT跟踪方法在MOT16数据集上,HOTA(higher order tracking accuracy)值由 58.3％提高至 59.8％,IDF1(ID F1 score)值由 72.6％提高至 73.1％,MOTA(multi-object tracking accuracy)值由 69.3％提高至 74.4％;在 MOT17 数据集上,HOTA 值由 59.3％提高至 59.9％,IDF1值由72.3％提高至72.9％.结论本文提出的特征平衡性跟踪方法,使边界框大小特征、热图特征和中心点偏移量特征在训练测试中达到了更好的平衡,使多目标跟踪结果更加准确.

外文标题：Multi-object tracking using adaptive-IoU loss and hierarchical association

外文摘要：Objective Multiple object tracking(MOT)is a mainstream task in computer vision,which aims mainly to esti-mate the tracklets of multiple objects in videos and has important applications in the fields of autonomous driving,human-computer interaction,and human activity recognition.A large number of methods focus on improving the tracking perfor-mance based on the given detection results.Re-ID based trackers can be divided into two categories:separate detection and embedding(SDE)tracking models and joint detection and embedding(JDE)tracking models.The SDE tracking model tunes the detection model and the Re-ID model separately to optimize the model,but this leads to the disadvantage of the SDE tracking model being unable to perform real-time detection.The JDE tracking model performs object detection while outputting the object location and appearance embedding information for the next step of object association,thus improving the algorithm's operational speed.However,the JDE tracking method suffers from the problem of identity switching due to ambiguous pedestrian features and the degradation of tracking accuracy due to occlusion between objects in complex scenes.An adaptive intersection-over-union(AIoU)-tracker multi-object tracking algorithm is proposed to address these issues.Method First,we utilize the backbone network detection head to design a special AIoU regression loss function that measures the overlap area,center point distance,and aspect ratio.This approach helps alleviate the problem caused by identity switching due to ambiguous pedestrian features.Second,we propose a simple and effective hier-archical association method to leverage the embedding information around association failure detection frames for Re-ID.The high-score detection frames and low-score detection frames are associated separately,improving the association accu-racy of multi-object tracking under occlusion conditions.We utilize a variant of the DLA-34 network architecture as the backbone network.The model parameters are trained on the common objects in context(COCO)dataset and used to initial-ize the model.The experiments are conducted on a system running Ubuntu 16.04 with 64 GB of memory and a GTX2080Ti GPU.The software configuration includes CUDA 10.2.We train the model using the Adam optimizer for 30 epochs,with an initial learning rate of 10-4.The learning rate is decayed to 10-5 after 20 epochs,and the batch size is set to 16.We apply standard data augmentation techniques,including rotation,scaling,and color jittering.The input image size is adjusted to 1 088 × 608 pixels,and the feature map resolution is set to 272 × 152 pixels.We evaluate our approach on the MOT Challenge benchmark,specifically the MOT16 and the MOT17 datasets.The experiments utilize various datasets,including CrowdHuman,MIX dataset(ETH,CityPerson,CUHKSYSU,Caltech,and PRW).The ETH and CityPerson datasets only provide bounding box annotations,so we only train the detection branch on these datasets.The Caltech,MOT17,CUHKSYSU,and PRW datasets provide both bounding box positions and ID annotations,allowing for training of both branches.To ensure a fair comparison,we remove the overlapping videos between the ETH dataset and the MOT17 test dataset.The CrowdHuman dataset only contains bounding box annotations,so we perform self-supervised training on it.To evaluate the tracking performance,we use several well-defined metrics,including higher-order tracking accuracy(HOTA),multi-object tracking accuracy(MOTA),ID F1 score(IDF1),false positive,false negative,and number of identity switches(IDs).MOTA primarily assesses the performance of the detection branch,IDF1 evaluates identity preser-vation,focusing on the association performance,and HOTA provides a comprehensive evaluation of both the detection branch and the data association performance.Result The performance of our method is compared with that of existing meth-ods on two datasets.The comparison results are as follows:1)our HOTA value is 59.8％on the MOT16 dataset,which is increased by 1.5％compared with the FairMOT.Our MOTA value is 74.4％on the MOT16 dataset,which is increased by 5.1％compared with the FairMOT.Our IDF1 value is 73.1％on the MOT16 dataset,which is increased by 0.5％com-pared with the FairMOT.2)The HOTA value is 59.9％on the MOT17 dataset,which is increased by 0.6％compared with the FairMOT.The IDF1 value is 72.9％on the MOT17 dataset,which is increased by 1.6％compared with the FairMOT.In addition,we conduct ablation studies on the MOT17 dataset to verify the effectiveness of different components in our method,which demonstrates that the proposed method significantly outperforms the competition in multiple object track-ing.In the ablation studies,we observe a decrease in the number of identity switches through the added AIoU regression loss function.We also visualize the predicted Re-ID feature extraction positions,bounding box size feature,heat map fea-ture,and center point offset feature.The visualization results show that our method is more robust than FairMOT.More-over,our hierarchical association method makes the association more robust.For example,even after two frames,obscured IDs can still be associated.Conclusion The proposed feature balancing tracking method achieves better balance among the bounding box size feature,heat map feature,and center point offset feature during training and testing,resulting in more accurate multi-object tracking results.In this study,we propose two improvement measures for the FairMOT frame-work.First,we design an AIoU regression loss module to optimize the detection branch,enabling it to optimize targets based on the current optimal distance and extract more accurate appearance features.Second,we optimize the Re-ID branch through a hierarchical association strategy module,utilizing three-level matching to enhance the tracking system's association performance.Experimental results demonstrate significant improvements on the MOT 17 dataset,with HOTA increasing to 59.9％,IDF1 increasing to 72.9％,and MOTA increasing to 70.8％.However,a competition issue exists between the detection and Re-ID branches in the JDE tracking model,which can lead to a decrease in MOTA.Future research will focus on investigating this competition in the JDE tracking model.

外文关键词：

multi-object tracking(MOT)data associationregression lossfeature balancehierarchical association method

作者：

郭文、刘其贵、丁昕苗

展开 >

作者单位：

山东工商学院信息与电子工程学院,烟台 264005

关键词：

多目标跟踪(MOT) 数据关联回归损失特征平衡性级联匹配方法

基金：

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目山东省研究生教育创新计划项目

项目编号：

620722866187610061572296SDYAL21211

出版年：

2024

DOI：

10.11834/jig.230390

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(7)

参考文献量1