SkeletonNet: Mining Deep Part Features for 3-D Action Recognition
Access Status
Authors
Date
2017Type
Metadata
Show full item recordCitation
Source Title
ISSN
School
Collection
Abstract
This letter presents SkeletonNet, a deep learning framework for skeleton-based 3-D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We first extract body-part-based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation, and scale invariant. To learn robust temporal information, instead of treating the features of all frames as a time series, we transform the features into images and feed them to the proposed deep learning network, which contains two parts: one to extract general features from the input images, while the other to generate a discriminative and compact representation for action recognition. The proposed method is tested on the SBU kinect interaction dataset, the CMU dataset, and the large-scale NTU RGB+D dataset and achieves state-of-the-art performance.
Related items
Showing items related by title, author, creator and subject.
-
Ke, Q.; Bennamoun, M.; An, Senjian; Sohel, F.; Boussaid, F. (2017)© 2017 IEEE. This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips ...
-
Ke, Q.; Bennamoun, M.; An, Senjian; Sohel, F.; Boussaid, F. (2018)This paper presents a new representation of skeleton sequences for 3D action recognition. Existing methods based on hand-crafted features or recurrent neural networks cannot adequately capture the complex spatial structures ...
-
Zhang, Li (2009)This research aims to address one of the most challenging problems in the field of computer vision and computer graphics, that is, the reconstruction of smooth 3D human motions from monocular video containing unrestricted ...