DEEP LEARNING FOR TAEKWONDO ACTION RECOGNITION

Author(s): LIU, Z., LI, K., QIN, X., Institution: BEIJING SPORT UNIVERSITY, Country: CHINA, Abstract-ID: 398

INTRODUCTION:
Action recognition in Taekwondo teaching and training is an important task, and inertial measurement units (IMUs) have more advantages in collecting data compared to camera-based systems. We proposed different deep learning algorithms to recognize Taekwondo technical actions, and compared the accuracies of deep learning methods with the traditional machine learning methods, the data were collected by three-axis accelerometers. This study can promote the public physical Taekwondo training and teaching, and the action recognition can be applied to boxing and other fields.
METHODS:
Firstly, twenty subjects wore three-axis accelerometers on different parts of their bodies, including their left wrist, right wrist, lower back, left ankle, and right ankle. They completed Taekwondo technical movements which including 8 types of hand movements (left punch, right punch, left upper block, right upper block, left middle block, right middle block, left lower block and right lower block) and 8 types of leg movements (left front kick, right front kick, left roundhouse kick, right roundhouse kick, left side kick, right side kick, left back kick and right back kick). Each action was performed every 4 seconds, the action name and abnormal information were also recorded at the same time to facilitate subsequent data preprocessing. The sampling frequency of the accelerometer is 30Hz. Secondly, during the data preprocessing, we removed the abnormal data, merged the time series information of different accelerometers, intercepted 4s clips of the action part, and created a data set. Thirdly, action recognition was performed through different deep learning algorithms, for example Convolutional Neural Network(CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU), we also compared the results with traditional machine learning method.
RESULTS:
The accuracy of one-dimensional residual CNN is 90.3%, after we abandon the Batch Norm layer and add the Dropout layer, the best accuracy of the final validation set reaches 95.5%. However, the accuracy of GRU is 86.4%, LSTM is 90.3%, and they both have varying degrees of overfitting. Compared with traditional machine learning, for example, the accuracy of support vector machines is 82%, decision trees is 77%, both are worse than using deep learning methods.
CONCLUSION:
Deep learning method is a better choice in action recognition, it requires no manual feature extraction and the accuracy is usually higher than machine learning method. According to our experiment with small data set, smaller batch size with Dropout can reach higher accuracy and can effectively reduce overfitting in our task. On the other hand, feature extraction in machine learning is still a challenging question.