抄録
Human action recognition has become a key aspect of human–computer interaction nowadays. Existing spatial–temporal networks-based human action recognition methods have achieved better performance but at the high cost of computational complexity. These methods make the final predictions using a stack of blocks, where each block contains a spatial and a temporal module for extracting the respective features. Whereas an alternative arrangement of these blocks in the network may affect the optimal configuration for each specific sample. Moreover, these methods need a high inference time, consequently their implementation on cutting-edge low-spec devices is challenging. To resolve these limitations, we propose a decision network-based adaptive framework that dynamically determines the arrangement of the spatial and temporal modules to ensure a cost-effective network design. To determine the optimal network structure, we have investigated module selection decision-making schemes at local and global level. We have conducted extensive experiments using three publicly available datasets. The results show our proposed framework arranges the modules in an optimal way and efficiently reduces the computation cost while maintaining the performance. Our code is available at https://github.com/11yxk/dynamic_skeleton.
| 本文言語 | 英語 |
|---|---|
| 論文番号 | 103233 |
| ジャーナル | Displays |
| 巻 | 91 |
| DOI | |
| 出版ステータス | 出版済み - 01-2026 |
All Science Journal Classification (ASJC) codes
- 人間とコンピュータの相互作用
- ハードウェアとアーキテクチャ
- 電子工学および電気工学