Motion Features for Human Action Recognition Using 3D Skeleton Model

Salah R. Althloothi; Almokhtar Alazhari

المؤلفون

Salah R. Althloothi Computer Engineering Department, Faculty of Engineering, University of Al Zawiya
Almokhtar Alazhari Computer Engineering Department, Faculty of Engineering, University of Al Zawiya

الكلمات المفتاحية:

Features، Recognition، Tracking، Action description، Skeleton

الملخص

This paper presents the development of motion features for accurately extracting the distal segments of human limbs in visual data for human action recognition. Using the depth map provided by the Kinect sensor, motion features are extracted to classify human actions in videos. The motion features are the motion of the 3D joint positions of the human body. These 3D joint positions are used to provide precise endpoints of the distal segments of each limb which are reduced to centroids for efficient recognition. Each limb centroid is described by its angle with respect to the vertical body axis to create an action descriptor vector. The action descriptor which represents the position of the torso and four limb segments is detected and tracked without any manual initialization. It is also invariant to image resolution and video frame rates, making it suitable for a wide range of human tracking applications in real time surveillance. To evaluate our approach, a public dataset was used for human action recognition. The results of our experiments show a good direction in incorporating motion features using SVM technique for automated recognition of human actions.

المراجع

W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3d points,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 9– 14, IEEE, 2010.

Paul Scovanner, Saad Ali, Mubarak Shah, A 3-dimensional SIFT descriptor and its application to action recognition, in: Proceedings of the International Conference on Multimedia (MultiMedia’07), Augsburg, Germany, September 2007, pp. 357–360.

L. Xia, C.-C. Chen, and J. Aggarwal, “View invariant human action recognition using histograms of 3d joints,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on,pp. 20–27, June.

Ren, L. Shakhnarovich G., Hodgins K., Pfiste , and Viola P.”Learning silhouette features for control of human motion”. ACM Trans. Graph. 24(4), 1303–1331, 2005.

Morrison, P.; JuJia Zou, "An effective skeletonization method based on adaptive selection of contour points," Information Technology and Applications, ICITA 2005. Third International Conference on , vol.1, no., pp. 644-649 vol.1, 4-7 July 2005

M. Corporation, “Kinect for windows.” http://www.microsoft.com/en-us/kinectforwindows/, cited April 2013

Niu F and Mottaleb M. “View invariant human activity recognition based on shape and motion features.” In International Symposium on Multimedia Software Engineering ISMSE, pages 546 – 556, Dec 2004.

Niu F., Abdel-Mottaleb M., “HMM-Based Segmentation and Recognition of Human Activities from Video Sequences,” IEEE Intern. Conf. on Multimedia and Expo (ICME), Amsterdam, Netherlands, pp.804-807, July, 2005.

Yilmaz, A., Shah, M.: Actions As Objects: A Novel Action Representation. In: IEEE CVPR, San Diego, IEEE Computer Society Press, Los Alamitos (2005)

Haritaoglu I, Harwood, and L. S. Davis. W4: Real-time surveillance of people and their activities.

IEEE transaction on Pattern Analysis and Machine Intelligence, 22(8):809 –830, 2000.

Vignola, J., Lalonde, J.-F. and Bergevin, R., Progressive human skeleton fitting. In: Proceedings of the 16th Vision Interface Conference, Halifax, Canada. pp. 35-42. 2003.

Laptev I, and Lindeberg T. Space-time interest points, In ICCV, p. 432-439, 2003

Haritaoglu I., Harwood, and L. Davis. Who,when, where, what: A real time system for detecting and tracking people. In Proceedings of the 3th Face and Gesture Recognition Conf., pages 222–227, 1998.

ChenHsuan-Sheng, Hua-Tsung Chen, Yi-Wen Chen and Suh-Yin Lee,“Human Action Recognition Using Star Skeleton,” in Proc. 4th ACM international workshop on Video surveillance and sensor networks, 2006, pp. 171-178.

Althloothi S., Mahoor M., and Voyles R., “Fitting distal limb segments for accurate skeletonization in human action recognition”, Journal of Ambient Intelligence and Smart Environments, 2012.

Gorelick L., Blank M., Shechtman E., Irani M., Basri R., “Actions as Space-Time Shapes,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, pp. 2247 – 2253, Nov. 2007.

Dempster, A. P., Laird, N. M., and Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B(Methodological), vol. 39, NO. 1, pp.1-38, 1997.

Wang, L., Ning, H., Hu., W.: Fusion of Static and Dynamic Body Biometrics for Gait Recognition. In: International Conference on Computer Vision, Nice, France, pp. 1449–1454 (2003).

Efros, A.A., Berg, A.C., Mori, G., Malik, J. Recognizing Action at a Distance. In: ICCV 2003. IEEE International Conference on Computer Vision, Nice, France, pp. 726–733.IEEE Computer Society Press, Los Alamitos (2003)

David G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision (IJCV) 60 (2)(2004) 91–110.

Poppe R. Vision-based human action recognition: A survey. Image and Vision Computing 28 976– 990, 2010

Moeslund T., HiltonA., and Krger V. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104:90–126, 2006.

Gavrila D. M.. The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73:82–98, 1999.

Aggarwal J. K. and CaiQ.. Human motion analysis: A review. Computer Vision and Image Understanding, 73:90–102, 1999.

Althloothi S., Mahoor M., Zhang X., and Voyles R., 2014. Human activity recognition using multi- features and multiple kernel learning. Pattern recognition, 47(5), pp.1800-1812.

Liang Wang, David Suter, Informative shape representations for human action recognition, in: Proceedings of the International Conference on Pattern Recognition (ICPR’06), vol. 2, Kowloon Tong, Hong Kong, August 2006, pp.1266–1269.

Tran D. and SorokinA. Human activity recognition with metric learning. In European Conference on Computer Vision, 2008.

Yu E. and Aggarwa J. K. l. Detection of fence climbing from monocular video. In International Conference on PatternRecognition, pages 375–378, Hong Kong, 2006.

Elden Yu and J. K. Aggarwal, "Human Action Recognition with Extremities as Semantic Posture Representation", International Workshop on Semantic Learning and Applications in Multimedia (SLAM, in conjunction with CVPR), Miami, FL, June 2009.

ChunS., HongK., and JungK., 3D star skeleton for fast human posture representation," in Proceedings of World Academy of Science, Engineering and Technology, vol. 34, pp. 273{282, October 2008.

Ming-yu Chen and Alex Hauptmann, " MoSIFT: Reocgnizing Human Actions in Surveillance Videos ". CMU-CS-09-161, Carnegie Mellon University, 2009

Wu, G., Mahoor, M.H., Althloothi, S., Voyles, R., “SIFT-Motion Estimation (SIFT-ME): A New Feature for Human Activity Recognition”, The 2010 International Conference on Image Processing,

Computer Vision, and Pattern Recognition, Los Vegas, July 2010

Althloothi, S., Voyles, R., Mahoor, M.H., Wu, G., “2D Human Skeleton Model from Monocular Video for Human Activity Recognition”, The 2010 International Conference on Image Processing, Computer Vision, and Pattern Recognition, Los Vegas, July 2010.

Althloothi, S.,”Human Action Recognition Via FusedKinematic Structure and Surface Representation” (2013). ElectronicTheses and Dissertations. 27.