TY - GEN
T1 - Temporal Attention for Robust Multiple Object Pose Tracking
AU - Li, Zhongluo
AU - Yoshimoto, Junichiro
AU - Ikeda, Kazushi
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - Estimating the pose of multiple objects has improved substantially since deep learning became widely used. However, the performance deteriorates when the objects are highly similar in appearance or when occlusions are present. This issue is usually addressed by leveraging temporal information that takes previous frames as priors to improve the robustness of estimation. Existing methods are either computationally expensive by using multiple frames, or are inefficiently integrated with ad hoc procedures. In this paper, we perform computationally efficient object association between two consecutive frames via attention through a video sequence. Furthermore, instead of heatmap-based approaches, we adopt a coordinate classification strategy that excludes post-processing, where the network is built in an end-to-end fashion. Experiments on real data show that our approach achieves state-of-the-art results on PoseTrack datasets.
AB - Estimating the pose of multiple objects has improved substantially since deep learning became widely used. However, the performance deteriorates when the objects are highly similar in appearance or when occlusions are present. This issue is usually addressed by leveraging temporal information that takes previous frames as priors to improve the robustness of estimation. Existing methods are either computationally expensive by using multiple frames, or are inefficiently integrated with ad hoc procedures. In this paper, we perform computationally efficient object association between two consecutive frames via attention through a video sequence. Furthermore, instead of heatmap-based approaches, we adopt a coordinate classification strategy that excludes post-processing, where the network is built in an end-to-end fashion. Experiments on real data show that our approach achieves state-of-the-art results on PoseTrack datasets.
UR - http://www.scopus.com/inward/record.url?scp=85178630564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178630564&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-8070-3_42
DO - 10.1007/978-981-99-8070-3_42
M3 - Conference contribution
AN - SCOPUS:85178630564
SN - 9789819980697
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 551
EP - 561
BT - Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
A2 - Luo, Biao
A2 - Cheng, Long
A2 - Wu, Zheng-Guang
A2 - Li, Hongyi
A2 - Li, Chaojie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Neural Information Processing, ICONIP 2023
Y2 - 20 November 2023 through 23 November 2023
ER -