Hand Gesture Recognition in Human-Computer Interaction
Hand gesture recognition is crucial for seamless human-computer interaction in various fields like virtual reality, gaming, and robotics. This project delves into the use of machine learning and deep learning techniques for accurate recognition of hand gestures, leveraging a dataset of near infrared images of distinct hand gestures. Previous research using CNN, depth sensors, and traditional SVM methods is reviewed, highlighting the effectiveness of machine learning models in adapting to data variations for improved results.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Hand Gesture Recognition Rui Zhong, Zhongyu Huo, Vedavyas Potnuru, Wendy Yunqi Yu
Background Hand gesture recognition has been of high interest in the study of human-computer interaction[2]. Quick and accurate hand gesture recognition is essential in many fields of human-computer interactions such as virtual reality, gaming, car system control and robotic control.
Background Other human gestures and behaviors recognition systems are used to prevent criminal activities by classifying hostile intent based on non- verbal communicative cues.[3] The wide variety of uses for hand gesture or behavior recognition motivates us to provide our study on this topic.
Literature Review Prior research has been done on this topic including using CNN with a sliding window applied to hand gestures captured real-time using video streams[4] Depth sensors have also been used for hand tracking and gesture recognition with Microsoft Kinect being a popular tool[5]
Literature Review Other research uses traditional SVM to perform hand gesture classification [6] Using glove-based input has also been a popular method for hand- tracking and gesture recognition [7].
ML/DL solution Machine learning is well suited for gesture recognition since it can adapt to the variations in data and learn from them to produce reliable results. Hand gesture recognition using machine learning provides much better results than conventional methods due to using a model that learns the features from a dataset
Dataset This project utilizes a kaggle dataset containing 20,000 near infrared images of ten distinct hand gestures [1]. Since the images have dark backgrounds, it is beneficial to apply grayscale to the images without losing any information. Other preprocessing steps include flipping and downsampling the images for better training and classification results. Each image is 240 x 640 pixels.
Details on the Model used Deep learning has seen phenomenal growth in computer vision and machine learning applications like image classification, segmentation , object recognition image super resolution So, we decided to exploit deep learning for our hand gesture recognition task. We construct a CNN with 3x3 strided convolutions, relu activations, batch normalization, dropout, and fully-connected layers with softmax. Using Keras v.2.3.1 with TF backend.
Details on the Model used 29x79x32 14x39x64 6x19x64 dropout 0.7 7296 512 10 Conv 3x3 stride2 Conv 3x3 stride 2 flatten FC FC Input image (60x160) Conv - 3x3 stride 2 Learning Rate: 0.001 | Epochs 20 | Batch Size 64 | Loss Function : Cross entropy | Optimizer : Adam | dropout rate : 0.7 | Train val split 80:20
De Details on the Model used Our Model as mentioned starts with a series of convolution layers that look like pyramid architecture as the strides decrease the spatial resolution of the inputs to half their size. Following the pyramid architecture, we add 2 fully-connected layers with 512 and 10 units, respectively. A diagram of the described model is shown We go into more detail about each of the different layers in our network below. Convolutions: The three convolutional layers use 32, 64, and 64 channels. Batch normalization: Batch normalization is added after convolutional layer to help prevent the model from overfitting for good generalization. Flatten: To order and reshape the input to array of values and apply the fully connected layer following it. Dropout: A dropout layer (with keep ratio = 0.7) is applied before the first fully-connected layer, so during training 70% of the nodes are dropped from the network to reduce co-adaptation between layers and hence overfitting. Fully-connected: As mentioned previously, we include 2 fully- connected layers with the following number of output units: 512, 10. Softmax: Softmax is used as the activation for the last fully-connected layer since it is the primary activation for classification with more than two classes.
Further actions to complete Next step: Will need to test against real images captured using a video recording. Reversing the white and black backgrounds Goal: Analyze and determine how to prevent overfitting on the LeapMotion dataset and be able to generalize for other ranges of different types of image datasets
References [1] Benen Harrington. Hand Gesture Recognition Database with CNN, 2018 https://www.kaggle.com/benenharrington/hand-gesture-recognition-database-with-cnn [2] Hand Gesture Recognition with Depth Images: A Review [3] C. J. Cohen, F. Morelli and K. A. Scott, "A Surveillance System for the Recognition of Intent within Individuals and Crowds," 2008 IEEE Conference on Technologies for Homeland Security, Waltham, MA, 2008, pp. 559-565, doi: 10.1109/THS.2008.4534514. [4] Okan Kop ukl u, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. Real-time hand gesture detection and classification using convolutional neural networks. CoRR, abs/1901.10323, 2019. [5] Hand Gesture Recognition with Depth Images: A Review [6] Static Hand Gesture Recognition using Mixture of Features and SVM classifier [7] a survey of glove-based input