Jetson 18 Object Recognition Based on DNN (Deep Neural Network)

From Waveshare Wiki
Jump to: navigation, search

This section introduces how to implement common object recognition using DNN (Deep Neural Network) + OpenCV.

Preparation

Since the product automatically runs the main program at startup, which occupies the camera resource, this tutorial cannot be used in such situations. You need to terminate the main program or disable its automatic startup before restarting the robot.
It's worth noting that because the robot's main program uses multi-threading and is configured to run automatically at startup through crontab, the usual method sudo killall python typically doesn't work. Therefore, we'll introduce the method of disabling the automatic startup of the main program here.
If you have already disabled the automatic startup of the robot's main demo, you do not need to proceed with the section on Terminate the Main Demo.

Terminate the Main Demo

1. Click the + icon next to the tab for this page to open a new tab called "Launcher."
2. Click on Terminal under Other to open a terminal window.
3. Type bash into the terminal window and press Enter.
4. Now you can use the Bash Shell to control the robot.
5. Enter the command: sudo killall -9 python.

Demo

The following code block can be run directly:

1. Select the code block below.
2. Press Shift + Enter to run the code block.
3. Watch the real-time video window.
4. Press STOP to close the real-time video and release the camera resources.

If you cannot see the real-time camera feed when running:

  • Click on Kernel -> Shut down all kernels above.
  • Close the current section tab and open it again.
  • Click STOP to release the camera resources, then run the code block again.
  • Reboot the device.

Features of This Section

The deploy.prototxt file and mobilenet_iter_73000.caffemodel file are in the same path as this .ipynb file.
When the code blocks run successfully, you can point the camera at common objects such as: "background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor".
The program will annotate the objects it recognizes in the image and label them with their names.

import cv2  # Import the OpenCV library for image processing
from picamera2 import Picamera2  # Library for accessing Raspberry Pi Camera
import numpy as np  # Library for mathematical calculations
from IPython.display import display, Image  # Library for displaying images in Jupyter Notebook
import ipywidgets as widgets  #Library for creating interactive widgets like buttons
import threading  # Library for creating new threads for asynchronous task execution

# Pre-defined class names based on the Caffe model
class_names = ["background", "aeroplane", "bicycle", "bird", "boat",
               "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
               "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
               "sofa", "train", "tvmonitor"]

# Load Caffe model
net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'mobilenet_iter_73000.caffemodel')

# Create a "Stop" button for users to stop the video stream by clicking it
# ================
stopButton = widgets.ToggleButton(
    value=False,
    description='Stop',
    disabled=False,
    button_style='danger', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Description',
    icon='square' # (FontAwesome names without the `fa-` prefix)
)


# Define the display function to process video frames and perform object detection
# ================
def view(button):
    # If you are using a CSI camera you need to comment out the picam2 code and the camera code.
    # Since the latest versions of OpenCV no longer support CSI cameras (4.9.0.80), you need to use picamera2 to get the camera footage

    
    # picam2 = Picamera2()  # Create Picamera2 example
    # picam2.configure(picam2.create_video_configuration(main={"format": 'XRGB8888', "size": (640, 480)}))  # Configure camera parameters
    # picam2.start()  # Start camera
    
    camera = cv2.VideoCapture(-1) # Create camera example
    #Set resolution
    camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    
    display_handle=display(None, display_id=True)  # Create a display handle for updating displayed images
    i = 0
    
    avg = None
    
    while True:
        # frame = picam2.capture_array() # Capture a frame from the camera
        _, frame = camera.read() # Capture a frame from the camerav
        # frame = cv2.flip(frame, 1) # if your camera reverses your image

        img = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)  # Convert the image from RGB to BGR because OpenCV uses BGR by default
        (h, w) = img.shape[:2]  # Get the height and width of the image
        # Generate the input blob for the network
        blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 0.007843, (300, 300), 127.5)
        net.setInput(blob)  # Set the blob as the input to the network
        detections = net.forward()  # Perform forward pass to get the detection results

        # Loop over the detected objects
        for i in range(0, detections.shape[2]):
            confidence = detections[0, 0, i, 2]  # Get the confidence of the detected object
            if confidence > 0.2:  # If the confidence is above the threshold, process the detected object
                idx = int(detections[0, 0, i, 1])  # Get the class index
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])  # Get the bounding box of the object
                (startX, startY, endX, endY) = box.astype("int")  # Convert the bounding box to integers

                # Annotate the object and confidence on the image
                label = "{}: {:.2f}%".format(class_names[idx], confidence * 100)
                cv2.rectangle(frame, (startX, startY), (endX, endY), (0, 255, 0), 2)
                y = startY - 15 if startY - 15 > 15 else startY + 15
                cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        
        _, frame = cv2.imencode('.jpeg', frame)  #  Encode the frame as JPEG format
        display_handle.update(Image(data=frame.tobytes()))  # Update the displayed image
        if stopButton.value==True:  # Check if the "Stop" button is pressed
            # picam2.close()  # If yes, close the camera
            cv2.release() # If yes, close the camera
            display_handle.update(None)  # Clear the displayed content

            
#Display the "Stop" button and start the display function's thread
# ================
display(stopButton)
thread = threading.Thread(target=view, args=(stopButton,))
thread.start()