20 Gesture Recognition Based on MediaPipe

This section introduces how to implement gesture recognition using MediaPipe + OpenCV.

What is MediaPipe?

MediaPipe is an open-source framework developed by Google for building machine learning-based multimedia processing applications. It provides a set of tools and libraries for processing video, audio, and image data, and applies machine learning models to achieve various functionalities such as pose estimation, gesture recognition, and face detection. MediaPipe is designed to offer efficient, flexible, and easy-to-use solutions, enabling developers to quickly build a variety of multimedia processing applications.

Preparation

Since the product automatically runs the main program at startup, which occupies the camera resource, this tutorial cannot be used in such situations. You need to terminate the main program or disable its automatic startup before restarting the robot.
It's worth noting that because the robot's main program uses multi-threading and is configured to run automatically at startup through crontab, the usual method sudo killall python typically doesn't work. Therefore, we'll introduce the method of disabling the automatic startup of the main program here.

Terminate the Main Program

1. Click the "+" icon next to the tab for this page to open a new tab called "Launcher."

2. Click on "Terminal" under "Other" to open a terminal window.

3. Type bash into the terminal window and press Enter.

4. Now you can use the Bash Shell to control the robot.

5. Enter the command: crontab -e.

6. If prompted to choose an editor, enter 1 and press Enter to select nano.

7. After opening the crontab configuration file, you'll see the following two lines:

@reboot ~/ugv_pt_rpi/ugv-env/bin/python ~/ugv_pt_rpi/app.py >> ~/ugv.log 2>&1

@reboot /bin/bash ~/ugv_pt_rpi/start_jupyter.sh >> ~/jupyter_log.log 2>&1

8. Add a # character at the beginning of the line with ……app.py >> …… to comment out this line.

#@reboot ~/ugv_pt_rpi/ugv-env/bin/python ~/ugv_pt_rpi/app.py >> ~/ugv.log 2>&1

@reboot /bin/bash ~/ugv_pt_rpi/start_jupyter.sh >> ~/jupyter_log.log 2>&1

9. Press Ctrl + X in the terminal window to exit. It will ask you Save modified buffer? Enter Y and press Enter to save the changes.

10. Reboot the device. Note that this process will temporarily close the current Jupyter Lab session. If you didn't comment out ……start_jupyter.sh >>…… in the previous step, you can still use Jupyter Lab normally after the robot reboots (JupyterLab and the robot's main program app.py run independently). You may need to refresh the page.

11. One thing to note is that since the lower machine continues to communicate with the upper machine through the serial port, the upper machine may not start up properly during the restart process due to the continuous change of serial port levels. Taking the case where the upper machine is a Raspberry Pi, after the Raspberry Pi is shut down and the green LED is constantly on without the green LED blinking, you can turn off the power switch of the robot, then turn it on again, and the robot will restart normally.

12. Enter the reboot command: sudo reboot.

13. After waiting for the device to restart (during the restart process, the green LED of the Raspberry Pi will blink, and when the frequency of the green LED blinking decreases or goes out, it means that the startup is successful), refresh the page and continue with the remaining part of this tutorial.

Example

The following code block can be run directly:

1. Select the code block below.

2. Press Shift + Enter to run the code block.

3. Watch the real-time video window.

4. Press STOP to close the real-time video and release the camera resources.

If you cannot see the real-time camera feed when running:

Click on Kernel -> Shut down all kernels above.
Close the current section tab and open it again.
Click `STOP` to release the camera resources, then run the code block again.
Reboot the device.

Features of this Section

When the code block runs successfully, you can place your hand in front of the camera, and the real-time video frame will display annotations indicating the joints of the hand. These annotations will change with the movement of your hand, and the positions of each joint will be outputted as well, facilitating further development for gesture control.
MediaPipe's gesture recognition process uses different names to correspond to different joints. You can retrieve the position information of a joint by calling its corresponding number.

MediaPipe Han

d 1.WRIST

2.THUMB_CMC

3.THUMB_MCP

4.THUMB_IP

5.THUMB_TIP

6.INDEX_FINGER_MCP

7.INDEX_FINGER_PIP

8.INDEX_FINGER_DIP

9.INDEX_FINGER_TIP

10.MIDDLE_FINGER_MCP

11.MIDDLE_FINGER_PIP

12.MIDDLE_FINGER_DIP

13.MIDDLE_FINGER_TIP

14.RING_FINGER_MCP

15.RING_FINGER_PIP

16.RING_FINGER_DIP

17.RING_FINGER_TIP

18.PINKY_MCP

19.PINKY_PIP

20.PINKY_DIP

21.PINKY_TIP

import cv2
import imutils, math
from picamera2 import Picamera2  # 用于访问 Raspberry Pi Camera 的库
from IPython.display import display, Image  # 用于在 Jupyter Notebook 中显示图像
import ipywidgets as widgets  # 用于创建交互式界面的小部件，如按钮
import threading  # 用于创建新线程，以便异步执行任务
import mediapipe as mp  # 导入 MediaPipe 库，用于手部关键点检测


# 创建一个“停止”按钮，用户可以通过点击它来停止视频流
# ================
stopButton = widgets.ToggleButton(
    value=False,
    description='Stop',
    disabled=False,
    button_style='danger', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Description',
    icon='square' # (FontAwesome names without the `fa-` prefix)
)

# 初始化 MediaPipe 绘图工具和手部关键点检测模型
mpDraw = mp.solutions.drawing_utils

mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1) # 初始化手部关键点检测模型，最多检测一只手

# 定义显示函数，用于处理视频帧并进行手部关键点检测
def view(button):
    # picam2 = Picamera2()  # 创建 Picamera2 的实例
    # picam2.configure(picam2.create_video_configuration(main={"format": 'XRGB8888', "size": (640, 480)}))  # 配置摄像头参数
    # picam2.start()  # 启动摄像头
    
    camera = cv2.VideoCapture(-1) 
    camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    
    display_handle=display(None, display_id=True)  # 创建显示句柄用于更新显示的图像
    
    while True:
        # frame = picam2.capture_array()
        _, frame = camera.read()
        # frame = cv2.flip(frame, 1) # if your camera reverses your image

        # frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        img = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        
        results = hands.process(img)

        # 如果检测到手部关键
        if results.multi_hand_landmarks:
            for handLms in results.multi_hand_landmarks: # 遍历检测到的每只手
                # 绘制手部关键点
                for id, lm in enumerate(handLms.landmark):
                    h, w, c = img.shape
                    cx, cy = int(lm.x * w), int(lm.y * h)  # 计算关键点在图像中的位置
                    cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)  # 在关键点位置绘制圆点

                
                frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
                mpDraw.draw_landmarks(frame, handLms, mpHands.HAND_CONNECTIONS) # 绘制手部骨架连接线
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 

                target_pos = handLms.landmark[mpHands.HandLandmark.INDEX_FINGER_TIP]

        _, frame = cv2.imencode('.jpeg', frame)
        display_handle.update(Image(data=frame.tobytes()))
        if stopButton.value==True:
            picam2.close()
            display_handle.update(None)

# 显示“停止”按钮并启动显示函数的线程
# ================
display(stopButton)
thread = threading.Thread(target=view, args=(stopButton,))
thread.start()

Navigation menu