Jetson 22 Gesture Recognition Based on MediaPipe

This section introduces how to implement gesture recognition using MediaPipe + OpenCV.

What is MediaPipe?

MediaPipe is an open-source framework developed by Google for building machine learning-based multimedia processing applications. It provides a set of tools and libraries for processing video, audio, and image data, and applies machine learning models to achieve various functionalities such as pose estimation, gesture recognition, and face detection. MediaPipe is designed to offer efficient, flexible, and easy-to-use solutions, enabling developers to quickly build a variety of multimedia processing applications.

Preparation

Since the product automatically runs the main program at startup, which occupies the camera resource, this tutorial cannot be used in such situations. You need to terminate the main program or disable its automatic startup before restarting the robot.
It's worth noting that because the robot's main program uses multi-threading and is configured to run automatically at startup through crontab, the usual method sudo killall python typically doesn't work. Therefore, we'll introduce the method of disabling the automatic startup of the main program here.
If you have already disabled the automatic startup of the robot's main demo, you do not need to proceed with the section on Terminate the Main Demo.

Terminate the Main Demo

1. Click the + icon next to the tab for this page to open a new tab called "Launcher."

2. Click on Terminal under Other to open a terminal window.

3. Type bash into the terminal window and press Enter.

4. Now you can use the Bash Shell to control the robot.

5. Enter the command: sudo killall -9 python.

Example

The following code block can be run directly:

1. Select the code block below.

2. Press Shift + Enter to run the code block.

3. Watch the real-time video window.

4. Press STOP to close the real-time video and release the camera resources.

If you cannot see the real-time camera feed when running:

Click on Kernel -> Shut down all kernels above.
Close the current section tab and open it again.
Click STOP to release the camera resources, then run the code block again.
Reboot the device.

Note

If you use the USB camera you need to uncomment frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB).

Features of this Section

When the code block is running normally, you can put your hand in front of the camera, the real-time video screen will be labeled with the joints of your hand, the labeled joints will change with the change of your hand, and the position of each joint will also be output, which is convenient for the secondary development of gesture control.
MediaPipe's gesture recognition process uses different names to correspond to different joints, and you can call the corresponding number to get the position information of the joint.
MediaPipe Hand

WRIST
THUMB_CMC
THUMB_MCP
THUMB_IP
THUMB_TIP
INDEX_FINGER_MCP
INDEX_FINGER_PIP
INDEX_FINGER_DIP
INDEX_FINGER_TIP
MIDDLE_FINGER_MCP
MIDDLE_FINGER_PIP
MIDDLE_FINGER_DIP
MIDDLE_FINGER_TIP
RING_FINGER_MCP
RING_FINGER_PIP
RING_FINGER_DIP
RING_FINGER_TIP
PINKY_MCP
PINKY_PIP
PINKY_DIP
PINKY_TIP

import cv2
import imutils, math
from picamera2 import Picamera2  # for accessing Raspberry Pi Camera library
from IPython.display import display, Image  # for displaying images on Jupyter Notebook  
import ipywidgets as widgets  # for creating interactive widgets like button  
import threading  #  Library for creating new threads for asynchronous task execution
import mediapipe as mp  # Import the MediaPipe library for detecting hand gesture key points  


# Create a "STOP" button that users can click to stop the video stream  
# ================
stopButton = widgets.ToggleButton(
    value=False,
    description='Stop',
    disabled=False,
    button_style='danger', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Description',
    icon='square' # (FontAwesome names without the `fa-` prefix)
)

# Initialize MediaPipe drawing tool and hand keypoint detection model  
mpDraw = mp.solutions.drawing_utils

mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1) # Initialize the detection model of hand keypoint, up to one hand  

# Define display functions to process video frames and perform hand keypoint detection
def view(button):
    # If you use the CSI camera, you need to comment out picam2 and camera 
    # Since the latest version OpenCV does not support CSI camera (4.9.0.80) anymore, you need to use picamera2 to get camera image
    
    # picam2 = Picamera2()  #Create Picamera2 example
    # Configure camera parameters and set the format and size of video
    # picam2.configure(picam2.create_video_configuration(main={"format": 'XRGB8888', "size": (640, 480)}))
    # picam2.start()  # Start camera

    camera = cv2.VideoCapture(-1) # Create camera example
    #Set resolution
    camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    
    display_handle=display(None, display_id=True)  # Creates a display handle for updating the displayed image
    
    while True:
        # frame = picam2.capture_array()
        _, frame = camera.read() # Capture a frame from the camera
        # frame = cv2.flip(frame, 1) # if your camera reverses your image

        img = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        results = hands.process(img)

        # If the hand key is detected
        if results.multi_hand_landmarks:
            for handLms in results.multi_hand_landmarks: # Iterate over each hand detected
                # Draw hand keypoints
                for id, lm in enumerate(handLms.landmark):
                    h, w, c = img.shape
                    cx, cy = int(lm.x * w), int(lm.y * h)  # Calculate the position of the keypoint in the image
                    cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)  # Drawing dots at keypoint locations

                
                frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
                mpDraw.draw_landmarks(frame, handLms, mpHands.HAND_CONNECTIONS) # Drawing hand skeleton connecting lines
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 

                target_pos = handLms.landmark[mpHands.HandLandmark.INDEX_FINGER_TIP]

        _, frame = cv2.imencode('.jpeg', frame)
        display_handle.update(Image(data=frame.tobytes()))
        if stopButton.value==True:
            # picam2.close() # if yes, close camera
            cv2.release() # if yes, close camera
            display_handle.update(None)

# Display the "Stop" button and start the thread that displays the function
# ================
display(stopButton)
thread = threading.Thread(target=view, args=(stopButton,))
thread.start()

Navigation menu