三维点云表示的人体动作序列预测

doi:10.3969/j.issn.0255-8297.2023.03.008

摘要/Abstract

摘要： 目前对三维人体动作序列的预测工作相对较少，且主要使用三角形网格表示人体模型，不如三维点云那样简单又容易获取。为此，该文用三维点云表示人体模型，提出一种基于MeteorNet 的点云动作序列预测方法。将动作序列中不同时刻的三维点云融合在一起，寻找点的时空邻域进行分组；叠加三层 Meteor 模块在时空邻域聚合信息，以获取点云序列的时空特征；通过三层全连接网络预测动作的点云坐标。实验结果表明，该方法预测出的人体动作与真实动作的误差较小。

关键词: 三维人体, 点云序列, 动作预测, MeteorNet

Abstract: Few works on action prediction of 3D human have been reported, and most of them represent human model with triangular mesh, which is not as simple and obtainable as 3D point clouds. Therefore, this paper proposes a point cloud action sequence prediction method based on MeteorNet by using 3D point clouds to represent human model. In an action sequence, the 3D point clouds at different times are fused together for finding spatiotemporal neighborhoods of the point clouds and grouping them; Three-layer Meteor modules are superimposed in the spatiotemporal neighborhoods for aggregating information and obtaining spatiotemporal features of the point cloud sequence; thus, the point cloud coordinates of action are predicted by a three-layer fully connected network. Experimental results show that the human actions predicted by the proposed method have lower errors with real actions.

Key words: 3D human body, point cloud sequence, action prediction, MeteorNet

中图分类号:

TP391.41

王辉, 丁铂栩. 三维点云表示的人体动作序列预测[J]. 应用科学学报, 2023, 41(3): 461-475.

WANG Hui, DING Boxu. Human Action Sequence Prediction of 3D Point Cloud Representation[J]. Journal of Applied Sciences, 2023, 41(3): 461-475.

参考文献

[1] Kong Y, Fu Y. Human action recognition and prediction: a survey [OL]. 2018[2022-06-01]. https://arxiv.org/pdf/1806.11230.pdf.
[2] 杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915. Yang T M, Chen Z, Yue W J. A spatiotemporal dual-stream human action recognition model based on video deep learning [J]. Computer Applications, 2018, 38(3): 895-899, 915. (in Chinese)
[3] 马翠红, 王毅, 毛志强. 基于注意力的双流CNN的行为识别[J]. 计算机工程与设计, 2020, 41(10): 2903-2906. Ma C H, Wang Y, Mao Z Q. Action recognition based on attention-based dual-stream CNN [J]. Computer Engineering and Design, 2020, 41(10): 2903-2906. (in Chinese)
[4] 宋立飞, 翁理国, 汪凌峰, 等. 多尺度输入3D卷积融合双流模型的行为识别方法[J]. 计算机辅助设计与图形学学报, 2018, 30(11): 2074-2083. Song L F, Weng L G, Wang L F, et al. Behavior recognition method based on multiscale input 3D convolution fusion two-stream model [J]. Journal of Computer Aided Design and Graphics, 2018, 30(11): 2074-2083. (in Chinese)
[5] Zhou Y, Sun X, Luo C, et al. Spatio-temporal fusion in 3D CNNs: a probabilistic view [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 9829-9838.
[6] Zhang J, Li W, Wang P, et al. A large scale RGB-D dataset for action recognition [C]//International Workshop on Understanding Human Activities through 3D Sensors, 2016: 101-114.
[7] Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 7912-7921.
[8] Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition [C]//Thirty-second AAAI Conference on Artificial Intelligence, 2018: 1-9.
[9] 管珊珊, 张益农. 基于残差时空图卷积网络的3D人体行为识别[J]. 计算机应用与软件, 2020, 37(3): 198-201, 250. Guan S S, Zhang Y N. 3D human action recognition based on residual spatiotemporal graph convolutional networks [J]. Computer Applications and Software, 2020, 37(3): 198-201, 250. (in Chinese)
[10] 李炫烨, 郝兴伟, 贾金公, 等. 结合多注意力机制与时空图卷积网络的人体动作识别方法[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1055-1063. Li X Y, Hao X W, Jia J G, et al. Human action recognition method combining multi-attention mechanism and spatio-temporal graph convolutional network [J]. Journal of Computer-Aided Design and Graphics, 2021, 33(7): 1055-1063. (in Chinese)
[11] 李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921. Li Y Z, Yuan J Z, Liu H Z. Human skeleton action recognition algorithm based on spatiotemporal attention graph convolutional network model [J]. Computer Applications, 2021, 41(7): 1915-1921. (in Chinese)
[12] Li M, Chen S, Zhao Y, et al. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 214-223.
[13] Xiao Y P, Lai Y K, Zhang F L, et al. A survey on deep geometry learning: from a representation perspective [J]. Computational Visual Media, 2020, 6(2): 113-133.
[14] Maturana D, Scherer S. VoxNet: a 3D convolutional neural network for real-time object recognition [C]//IEEE International Conference on Intelligent Robots and Systems, 2015: 922- 928.
[15] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition [C]//IEEE International Conference on Computer Vision, 2015: 945-953.
[16] Hanocka R, Hertz A, Fish N, et al. MeshCNN: a network with an edge [J]. ACM Transactions on Graphics, 2019, 38(4): 1-12.
[17] Qi C R, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 652- 660.
[18] Charles R, Li Y, Hao S, et al. Deep hierarchical feature learning on point sets in a metric space [C]//Advances in Neural Information Processing Systems, 2017: 4-9.
[19] Liu X, Yan M, Bohg J. MeteorNet: deep learning on dynamic 3D point cloud sequences [C]//IEEE International Conference on Computer Vision, 2019: 9246-9255.
[20] Wang Y, Xiao Y, Xiong F, et al. 3DV: 3D dynamic voxel for action recognition in depth video [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2020: 511-520.
[21] Veinidis C, Pratikakis I, Theoharis T. Unsupervised human action retrieval using salient points in 3D mesh sequences [J]. Multimedia Tools and Applications, 2019, 78(3): 2789-2814.
[22] Zhang Y, Black M J, Tang S. We are more than our joints: predicting how 3D bodies move [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2021: 3372-3382.
[23] Qiao Y L, Lai Y K, Fu H, et al. Synthesizing mesh deformation sequences with bidirectional LSTM [J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(4): 1906-1916.
[24] Bogo F, Romero J, Pons-Moll G, et al. Dynamic FAUST: registering human bodies in motion [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6233-6242.
[25] Mahmood N, Ghorbani N, Troje N F, et al. AMASS: archive of motion capture as surface shapes [C]//IEEE International Conference on Computer Vision, 2019: 5442-5451.