A motion-aware and temporal-enhanced Spatial–Temporal Graph Convolutional Network for skeleton-based human action segmentation

Shurong Chai, Rahul Kumar Jain, Jiaqing Liu, Shiyu Teng, Tomoko Tateyama, Yinhao Li, Yen Wei Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Action segmentation task is an important approach for understanding the actions from the video. Most of the conventional action recognition tasks can recognize only a single action from a given input video, thus we need to input a pre-trimmed video containing only one type of action. In contrast, temporal action segmentation (TAS) aims to segment a temporally untrimmed video sequence by time. Consequently, it has wider application prospects in various fields. Previously proposed TAS-based methods use only RGB color video as input to segment the actions, but RGB video is not robust against diverse backgrounds. Whereas skeleton-based features are more resilient as they do not incorporate any background information but there has been limited research exploring this feature modality. To this end, we propose a motion-aware and temporal-enhanced spatial–temporal graph convolutional network for the skeleton-based human action segmentation. Our framework contains a motion-aware module, multi-scale temporal convolutional network, temporal-enhanced graph convolutional network module and a refinement module. Our method can efficiently capture the motion information and long-range dependencies using skeleton features while improving temporal modeling. We have conducted experiments using four publicly available datasets to demonstrate the effectiveness of our introduced method. The code is available at https://github.com/11yxk/openpack.

Original languageEnglish
Article number127482
JournalNeurocomputing
Volume580
DOIs
Publication statusPublished - 01-05-2024

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A motion-aware and temporal-enhanced Spatial–Temporal Graph Convolutional Network for skeleton-based human action segmentation'. Together they form a unique fingerprint.

Cite this