Abstract
Human action recognition has become a key aspect of human–computer interaction nowadays. Existing spatial–temporal networks-based human action recognition methods have achieved better performance but at the high cost of computational complexity. These methods make the final predictions using a stack of blocks, where each block contains a spatial and a temporal module for extracting the respective features. Whereas an alternative arrangement of these blocks in the network may affect the optimal configuration for each specific sample. Moreover, these methods need a high inference time, consequently their implementation on cutting-edge low-spec devices is challenging. To resolve these limitations, we propose a decision network-based adaptive framework that dynamically determines the arrangement of the spatial and temporal modules to ensure a cost-effective network design. To determine the optimal network structure, we have investigated module selection decision-making schemes at local and global level. We have conducted extensive experiments using three publicly available datasets. The results show our proposed framework arranges the modules in an optimal way and efficiently reduces the computation cost while maintaining the performance. Our code is available at https://github.com/11yxk/dynamic_skeleton.
| Original language | English |
|---|---|
| Article number | 103233 |
| Journal | Displays |
| Volume | 91 |
| DOIs | |
| Publication status | Published - 01-2026 |
All Science Journal Classification (ASJC) codes
- Human-Computer Interaction
- Hardware and Architecture
- Electrical and Electronic Engineering