Signal and Information Processing

Human Action Sequence Prediction of 3D Point Cloud Representation

Expand
  • School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang 050043, Hebei, China

Received date: 2022-06-30

  Online published: 2023-06-16

Abstract

Few works on action prediction of 3D human have been reported, and most of them represent human model with triangular mesh, which is not as simple and obtainable as 3D point clouds. Therefore, this paper proposes a point cloud action sequence prediction method based on MeteorNet by using 3D point clouds to represent human model. In an action sequence, the 3D point clouds at different times are fused together for finding spatiotemporal neighborhoods of the point clouds and grouping them; Three-layer Meteor modules are superimposed in the spatiotemporal neighborhoods for aggregating information and obtaining spatiotemporal features of the point cloud sequence; thus, the point cloud coordinates of action are predicted by a three-layer fully connected network. Experimental results show that the human actions predicted by the proposed method have lower errors with real actions.

Cite this article

WANG Hui, DING Boxu . Human Action Sequence Prediction of 3D Point Cloud Representation[J]. Journal of Applied Sciences, 2023 , 41(3) : 461 -475 . DOI: 10.3969/j.issn.0255-8297.2023.03.008

References

[1] Kong Y, Fu Y. Human action recognition and prediction: a survey [OL]. 2018[2022-06-01]. https://arxiv.org/pdf/1806.11230.pdf.
[2] 杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915. Yang T M, Chen Z, Yue W J. A spatiotemporal dual-stream human action recognition model based on video deep learning [J]. Computer Applications, 2018, 38(3): 895-899, 915. (in Chinese)
[3] 马翠红, 王毅, 毛志强. 基于注意力的双流CNN的行为识别[J]. 计算机工程与设计, 2020, 41(10): 2903-2906. Ma C H, Wang Y, Mao Z Q. Action recognition based on attention-based dual-stream CNN [J]. Computer Engineering and Design, 2020, 41(10): 2903-2906. (in Chinese)
[4] 宋立飞, 翁理国, 汪凌峰, 等. 多尺度输入3D卷积融合双流模型的行为识别方法[J]. 计算机辅助设计与图形学学报, 2018, 30(11): 2074-2083. Song L F, Weng L G, Wang L F, et al. Behavior recognition method based on multiscale input 3D convolution fusion two-stream model [J]. Journal of Computer Aided Design and Graphics, 2018, 30(11): 2074-2083. (in Chinese)
[5] Zhou Y, Sun X, Luo C, et al. Spatio-temporal fusion in 3D CNNs: a probabilistic view [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 9829-9838.
[6] Zhang J, Li W, Wang P, et al. A large scale RGB-D dataset for action recognition [C]//International Workshop on Understanding Human Activities through 3D Sensors, 2016: 101-114.
[7] Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 7912-7921.
[8] Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition [C]//Thirty-second AAAI Conference on Artificial Intelligence, 2018: 1-9.
[9] 管珊珊, 张益农. 基于残差时空图卷积网络的3D人体行为识别[J]. 计算机应用与软件, 2020, 37(3): 198-201, 250. Guan S S, Zhang Y N. 3D human action recognition based on residual spatiotemporal graph convolutional networks [J]. Computer Applications and Software, 2020, 37(3): 198-201, 250. (in Chinese)
[10] 李炫烨, 郝兴伟, 贾金公, 等. 结合多注意力机制与时空图卷积网络的人体动作识别方法[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1055-1063. Li X Y, Hao X W, Jia J G, et al. Human action recognition method combining multi-attention mechanism and spatio-temporal graph convolutional network [J]. Journal of Computer-Aided Design and Graphics, 2021, 33(7): 1055-1063. (in Chinese)
[11] 李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921. Li Y Z, Yuan J Z, Liu H Z. Human skeleton action recognition algorithm based on spatiotemporal attention graph convolutional network model [J]. Computer Applications, 2021, 41(7): 1915-1921. (in Chinese)
[12] Li M, Chen S, Zhao Y, et al. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 214-223.
[13] Xiao Y P, Lai Y K, Zhang F L, et al. A survey on deep geometry learning: from a representation perspective [J]. Computational Visual Media, 2020, 6(2): 113-133.
[14] Maturana D, Scherer S. VoxNet: a 3D convolutional neural network for real-time object recognition [C]//IEEE International Conference on Intelligent Robots and Systems, 2015: 922- 928.
[15] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition [C]//IEEE International Conference on Computer Vision, 2015: 945-953.
[16] Hanocka R, Hertz A, Fish N, et al. MeshCNN: a network with an edge [J]. ACM Transactions on Graphics, 2019, 38(4): 1-12.
[17] Qi C R, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 652- 660.
[18] Charles R, Li Y, Hao S, et al. Deep hierarchical feature learning on point sets in a metric space [C]//Advances in Neural Information Processing Systems, 2017: 4-9.
[19] Liu X, Yan M, Bohg J. MeteorNet: deep learning on dynamic 3D point cloud sequences [C]//IEEE International Conference on Computer Vision, 2019: 9246-9255.
[20] Wang Y, Xiao Y, Xiong F, et al. 3DV: 3D dynamic voxel for action recognition in depth video [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 511-520.
[21] Veinidis C, Pratikakis I, Theoharis T. Unsupervised human action retrieval using salient points in 3D mesh sequences [J]. Multimedia Tools and Applications, 2019, 78(3): 2789-2814.
[22] Zhang Y, Black M J, Tang S. We are more than our joints: predicting how 3D bodies move [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2021: 3372-3382.
[23] Qiao Y L, Lai Y K, Fu H, et al. Synthesizing mesh deformation sequences with bidirectional LSTM [J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(4): 1906-1916.
[24] Bogo F, Romero J, Pons-Moll G, et al. Dynamic FAUST: registering human bodies in motion [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6233-6242.
[25] Mahmood N, Ghorbani N, Troje N F, et al. AMASS: archive of motion capture as surface shapes [C]//IEEE International Conference on Computer Vision, 2019: 5442-5451.
Outlines

/