5. AI Vision Course

5.1 Single Color Recognition

5.1.1 Program Description

The implementation of color recognition consists of two parts: color detection and execution feedback after recognition.

First, for the color detection part, Gaussian filtering is applied to the image to reduce noise. The Lab color space is then used to convert the color of the object .

Next, the object’s color within the circle is recognized using color thresholding, followed by masking (masking involves using selected images, shapes, or objects to globally or locally obscure the image being processed).

After performing morphological operations such as opening and closing on the object image, the object with the largest contour is circled.

Opening: The image undergoes erosion followed by dilation. This operation removes small objects, smooths shape boundaries, and preserves the area. It can eliminate small noise particles and separate connected objects.

Closing: The image undergoes dilation followed by erosion. This operation fills small holes within objects, connects nearby objects, closes broken contour lines, and smooths boundaries while preserving the area.

After recognition, the servo and buzzer are set up to provide feedback based on the detected color. For example, when red is detected, the buzzer will emit a sound.

For detailed feedback behavior, please refer to section 3. Function Implementation of this document.

5.1.2 Start and Close the Game

Note

The input command is case-sensitive, and keywords can be auto-completed using the Tab key.

(1) Power on the device and, following the instructions in 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection, use the VNC remote connection tool to connect.

(2) Click the icon in the top left corner of the system desktop to open the LX terminal.

(3) Execute the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions

(4) Enter the command and press Enter to start the program:

python3 Color_Warning.py

(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.

5.1.3 Program Outcome

After starting the game, the camera will be used to detect colors. When a red ball is recognized, the buzzer will emit a beep sound, and the ball will be circled in the transmitted image, with “Color: red” printed.

Note

  • During the recognition process, ensure the environment is well-lit to avoid inaccurate recognition due to poor lighting conditions.

  • Ensure that no objects with similar or matching colors to the target are present in the background within the camera’s visual range, as this may cause misrecognition.

  • If color recognition is inaccurate, refer to the section 5.1.5 Function Extensions in this document to adjust the color threshold settings.

5.1.4 Program Analysis

Source Code

  • Import Function Library

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import sys
import os
import cv2
import math
import time
import threading
import numpy as np

import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.ros_robot_controller_sdk as rrc
import hiwonder.yaml_handle as yaml_handle

# 初始化机器人底层驱动(initial robot underlying drivers)
board = rrc.Board()
Imported Module Purpose
import sys Imports Python's sys module, used for accessing system-specific parameters and functions.
import os Imports Python's os module, which provides functions for interacting with the operating system.
import cv2 Imports the OpenCV library, used for image processing and computer vision functionalities.
import time Imports Python's time module, used for time-related operations such as delays.
import threading Provides a multithreading environment for concurrent execution.
import numpy as np Imports the NumPy library, an open-source numerical computation library for handling arrays and matrix operations.
import hiwonder.Camera as Camera Imports the camera module from the Hiwonder library.
import hiwonder.Misc as misc Imports the Misc module, used for processing detected rectangular data.
import hiwonder.ros_robot_controller_sdk as rrc Imports the low-level robot control SDK for managing servos, motors, RGB lights, and other hardware.
import hiwonder.yaml_handle as yaml_handle Contains functions and tools related to handling YAML-formatted files.

(1) Import Libraries for OpenCV, Time, Math, and Threading

To use functions from a library, we can call them with the syntax:

library_name.function_name(parameter1, parameter2, …)

62
63
64
65
66
67
68
69
while True:
    if detect_color == 'red' and di_once:
        board.set_buzzer(1900, 0.1, 0.9, 1)  # 以1900Hz的频率,持续响0.1秒,关闭0.9秒,重复1次(at a frequency of 1900Hz, sound for 0.1 seconds, then pause for 0.9 seconds, repeat once)
        di_once = False
    elif not di_once and detect_color != 'red':
        di_once = True
    else:
        time.sleep(0.01)

For example, to call the sleep function from the time library, we use:

In Python, several libraries like time, cv2, and math are built-in and can be directly imported and used. You can also create your own libraries, like the “yaml_handle” file-reading library mentioned above.

(2) Instantiate a Library

Some library names can be long and hard to remember. To simplify function calls, we often instantiate libraries. For example:

16
17
# 初始化机器人底层驱动(initial robot underlying drivers)
board = rrc.Board()

Once instantiated, functions from the ros_robot_controller_sdk library can be conveniently called using the format rrc.function_name(parameter1, parameter2, ...).

44
45
46
47
48
def load_config():
    global lab_data, servo_data
    
    lab_data = yaml_handle.get_yaml_data(yaml_handle.lab_file_path)
    servo_data = yaml_handle.get_yaml_data(yaml_handle.servo_file_path)
  • Main Function Analysis

In a Python program, __name__ == '__main__' indicates the main function of the program, where the program starts by reading an image.

(1) Read Live Camera Feed

156
157
158
159
160
161
162
163
164
165
166
167
while True:
    ret, img = my_camera.read()
    if img is not None:
        frame = img.copy()
        frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR)  # 畸变矫正(distortion correction)
        Frame = run(frame)
        cv2.imshow('Frame', Frame)
        key = cv2.waitKey(1)
        if key == 27:
            break
    else:
        time.sleep(0.01)

(2) Start Image Processing

When an image is received, the “run()” function is called to process it.

76
77
78
79
80
81
82
def run(img):
    global draw_color
    global color_list
    global detect_color
        
    img_copy = img.copy()
    img_h, img_w = img.shape[:2]

① The img.copy() function creates a duplicate of the img object and assigns it to frame. ② The “run()” function handles the image processing operations.

161
Frame = run(frame)

(3) Resizing the Image. The image size is resized to facilitate processing.

84
frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST)

The first parameter "img_copy" is the input image.

The second parameter size specifies the output image size, which can be customized.

The third parameter interpolation=cv2.INTER_NEAREST defines the interpolation method.

INTER_NEAREST: Nearest-neighbor interpolation. INTER_LINEAR: Bilinear interpolation (default if not specified). INTER_CUBIC: Bicubic interpolation over a 4x4 pixel neighborhood. INTER_LANCZOS4: Lanczos interpolation over an 8x8 pixel neighborhood.

(4) Gaussian Filtering

Images often contain noise, which can degrade quality and obscure important features. Depending on the type of noise, different filtering methods may be used, such as Gaussian filtering, median filtering, and mean filtering.

Gaussian filtering is a type of linear smoothing filter that is effective in reducing Gaussian noise. It is widely used in image denoising processes.

85
frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)

The first parameter, frame_resize, is the input image.

The second parameter, (3, 3), specifies the size of the Gaussian kernel.

The third parameter, 3, represents the standard deviation of the Gaussian kernel in the X direction.

(5) Convert the Image to LAB Color Space

The cv2.cvtColor() function is used for color space conversion.

86
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)  # 将图像转换到LAB空间(convert the image to the LAB space)

The first parameter “frame_gb” is the input image.

The second parameter cv2.COLOR_BGR2LAB converts the image from BGR format to LAB format. To convert to RGB, use cv2.COLOR_BGR2RGB.

(6) Convert the Image to a Binary Image

The image is simplified by converting it to a binary image, containing only 0s and 1s, which reduces the data size and makes it easier to process. The cv2.inRange() function is used for thresholding.

 94
 95
 96
 97
 98
 99
100
frame_mask = cv2.inRange(frame_lab,
                         (lab_data[i]['min'][0],
                          lab_data[i]['min'][1],
                          lab_data[i]['min'][2]),
                         (lab_data[i]['max'][0],
                          lab_data[i]['max'][1],
                          lab_data[i]['max'][2]))  #对原图像和掩模进行位运算(perform bitwise operation to original image and mask)

The first parameter "frame_lab" is the input image.

The second parameter (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]) specifies the lower color threshold.

The third parameter (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2]) specifies the upper color threshold.

Apply Morphological Operations (Opening and Closing)

To reduce interference and smooth the image, morphological operations are applied. Opening is erosion followed by dilation, and closing is dilation followed by erosion. The cv2.morphologyEx() function is used.

101
102
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))  #腐蚀(corrosion)
dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation)

The first parameter is the input image.

The second parameter is the structuring element (also known as the kernel), which defines the nature of the operation. The size and shape of the kernel determine the extent of erosion or dilation.

(7) Find the Largest Contour

After completing the image processing, the largest contour is found using the cv2.findContours() function.

103
104
contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2]  #找出轮廓(find out contours)
areaMaxContour, area_max = getAreaMaxContour(contours)  #找出最大轮廓(find out the contour with the maximal area)

The first parameter "dilated" is the input image.

The second parameter cv2.RETR_EXTERNAL specifies the contour retrieval mode.

The third parameter cv2.CHAIN_APPROX_NONE)[-2] specifies the contour approximation method.

The largest contour is selected, and a minimum area threshold is set to ensure the target contour is valid only if its area exceeds this value.

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 找出面积最大的轮廓(find out the contour with the maximal area)
# 参数为要比较的轮廓的列表(parameter is the list to be compared)
def getAreaMaxContour(contours):
    contour_area_temp = 0
    contour_area_max = 0
    areaMaxContour = None

    for c in contours:  # 历遍所有轮廓(iterate through all contours)
        contour_area_temp = math.fabs(cv2.contourArea(c))  # 计算轮廓面积(calculate contour area)
        if contour_area_temp > contour_area_max:
            contour_area_max = contour_area_temp
            if contour_area_temp > 50:  # 只有在面积大于50时,最大面积的轮廓才是有效的,以过滤干扰(only contours with an area greater than 50 are considered valid; the contour with the largest area is used to filter out interference)
                areaMaxContour = c

    return areaMaxContour, contour_area_max  # 返回最大的轮廓(return the contour with the maximal area)

(8) Display the Result

162
163
cv2.imshow('Frame', Frame)
key = cv2.waitKey(1)

The function cv2.resize() is used to scale the processed image to an appropriate size.

The function cv2.imshow() displays the image in a window. ‘Frame’ is the name of the window, and Frame is the image content to be displayed. This function must be followed by cv2.waitKey(), otherwise the image will not appear.

The function cv2.waitKey() waits for a key press. The parameter 1 specifies the delay time in milliseconds.

  • Child Thread Analysis

(1) Turn On RGB Light

The RGB light will match the color detected by the recognition system.

118
119
120
121
122
123
124
125
126
if color_area_max == 'red':  #红色最大(red is the maximal area)
    detect_color = 'red'
    draw_color = range_rgb["red"]
elif color_area_max == 'green':  #绿色最大(green is the maximal area)
    detect_color = 'green'
    draw_color = range_rgb["green"]
elif color_area_max == 'blue':  #蓝色最大(blue is the maximal area)
    detect_color = 'blue'
    draw_color = range_rgb["blue"]

(2) Drive the Buzzer

59
60
61
62
63
64
65
66
67
68
69
def buzzer():
    global di_once
    global detect_color
    while True:
        if detect_color == 'red' and di_once:
            board.set_buzzer(1900, 0.1, 0.9, 1)  # 以1900Hz的频率,持续响0.1秒,关闭0.9秒,重复1次(at a frequency of 1900Hz, sound for 0.1 seconds, then pause for 0.9 seconds, repeat once)
            di_once = False
        elif not di_once and detect_color != 'red':
            di_once = True
        else:
            time.sleep(0.01)

The buzzer() function controls the buzzer. Inside this function, board.set_buzzer() is used to turn the buzzer on and off at a frequency of 1900 Hz, sounding for 0.1 seconds and silent for 0.9 seconds, repeated once.

5.1.5 Function Extensions

  • Adjusting Color Thresholds

If the color recognition performance is poor during the game experience, it may be necessary to adjust the color threshold. This section uses red as an example, and the same method can be applied to adjust other colors. Follow the steps below:

(1) Double-click , and in the popup interface, click “Execute”.

(2) Once in the interface, click “Connect” to link the camera.

(3) After a successful connection, select “red” from the color options in the lower-right corner of the interface.

(4) If the transmitted image does not appear in the popup window, the camera may not have connected successfully. Check that the camera’s connection cable is properly plugged in.

In the interface shown below, the right side displays the real-time transmitted image, while the left side shows the color to be detected. Point the camera at the red ball, then adjust the six sliders at the bottom so that the red ball area on the left turns entirely white, and the other areas turn black. Afterward, click the “Save” button to save the settings.

  • Changing the Default Recognized Color

The color recognition program is pre-configured to recognize three colors: red, green, and blue. By default, the program identifies red, triggering the buzzer to emit a beep and drawing a circle around the red ball in the transmitted image, displaying “Color: red”.

To change the recognized color to green, follow these steps:

(1) Enter the following command and press Enter to navigate to the source code directory:

cd TonyPi/Functions/

(2) Then, enter the following command and press Enter to open the program file:

vim Color_Warning.py

(3) Locate the code shown in the image below:

(4) Press the “i” key on the keyboard to enter edit mode.

(5) Replace “red” (highlighted in red in the image) with “green”, as shown in the image below:

(6) To save your changes, press the “Esc” key, then type “:wq” (note the colon before “wq”) and press Enter to save and exit.

(7) Enter the following command and press Enter to start the color recognition functionality:

python3 Color_Warning.py
  • Add New Recognition Color

In addition to the three built-in colors for recognition, you can also add custom colors. For example, to add purple as a new detectable color, follow these steps:

(1) Double-click the desktop icon on the system. When a prompt appears, simply click “Execute”.

(2) In the window that opens, click “Connect”.

(3) Click “Add”, name the new color (e.g., “purple”), then click “OK”.

(4) Use the dropdown menu next to the color selector and choose “purple”.

(5) Point the camera at a purple object. Adjust the L, A, and B sliders until the region representing the purple object turns white in the left preview window, and all other areas become black.

(6) Once satisfied, click “Save” to store the adjusted color threshold values.

(7) To verify whether the changes have been saved successfully, open the terminal and navigate to the program directory by entering:

cd TonyPi

(8) Then open the configuration file by typing:

vim lab_config.yaml

(9) Once the file is open, locate the section where color thresholds are defined. You should see the parameters for the newly added purple color.

(10) Following the earlier steps (1–4), open the file and press “i” to enter edit mode. Locate the section shown below, and manually add the following line (replace the values with the actual maximum threshold from step 9):

'purple':(255, 255, 114),

(11) Next, locate the section as shown in the figure (assumed visual reference).

(12) Manually add the contents shown in the highlighted area (assumed image reference).

(13) To save your changes, press Esc, then type :wq (note the colon before “wq”) and press Enter to save and exit.

(14) Refer back to “2. Game Start and Stop Instructions” to relaunch the program. Place a purple object in front of the camera—if successful, you’ll see a purple bounding box around it in the video feed and the word “purple” printed in the terminal.

If you’d like to add more recognizable colors, simply repeat the above steps with the desired color.

5.2 Color Recognition

5.2.1 Program Logic

The robot recognizes colors and provides feedback on the recognition result through “nodding” or “shaking” its head.

The following is the overall process:

First, program TonyPi to recognize colors with Lab color space.

Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics,

After processing the corrosion and inflation of the object image, the largest object contour is circled.

Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.

Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.

Next, judge the recognized color. If the sett color is detected the head servo will be turned up and down, otherwise it will be turned left and right.

5.2.2 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open the command-line terminal.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 ColorDetect.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.2.3 Project Outcome

Note

The program defaults to recognizing the color red. To switch to blue or green, refer to 5.2.5 Function Extension.

Place the red ball in front of the TonyPi. The robot will “nod” upon recognition. Place the blue and green balls in front of the TonyPi. The robot will “shake its head” upon recognition.

5.2.4 Program Analysis

Source Code

The source code of this program is locate in /home/pi/TonyPi/Functions/ColorDetect.py.

  • Import Parameter Module

Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities.
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import math The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.
import threading Provides an environment for running multiple threads concurrently.
import np The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.
import sensor.camera as camera Import camera library
from common import misc The "Misc" module has been imported for handling recognized rectangular data.
import common.ros_robot_controller_sdk as rrc The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware.
import common.yaml_handle Contains functionalities or tools related to processing YAML format files.
from common.controller import Controller Import action group execution library
  • Functional Logic

The system captures image data through the camera and processes it by converting the image to a binary format. To reduce interference and smooth the image, erosion and dilation operations are applied.

Next, it identifies the contour with the largest area and calculates the minimum enclosing circle. Based on this, the system determines the color of the detected object and provides the corresponding response.

  • Logical Flow and Corresponding Code Analysis

(1) Import function library

In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to 5.2.4 Program Analysis -> Import parameter module.

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys
import os
import cv2
import math
import time
import threading
import numpy as np
import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle

(2) Set initial state

Set initial state, including the initial position of servo, PID, color threshold value, etc.

67
68
69
70
# 初始化机器人舵机初始位置(initialize the servo initialization position of robot)
def initMove():
    ctl.set_pwm_servo_pulse(1, 1500, 500)
    ctl.set_pwm_servo_pulse(2, servo_data['servo2'], 500)

(3) Image pre-processing

Resizing and Gaussian blur processing of the image.

193
194
195
frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST)
frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)  # 将图像转换到LAB空间(convert the image to the LAB space)

cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.

The first parameter “img_copy” is the image to be resized.

The second parameter “size” is the target size.

The third parameter “interpolation” is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.

cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.

The first parameter “frame_resize” is the image to be blurred.

The second parameter “(3, 3)” is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.

The third parameter “3” is the standard deviation of the Gaussian kernel, used to control the degree of blur.

(4) Color space conversion

Convert the BGR image to LAB image.

195
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)  # 将图像转换到LAB空间(convert the image to the LAB space)

(5) Binarization processing

Use inRange() function in cv2 library to process binarization.

204
205
206
207
208
209
210
frame_mask = cv2.inRange(frame_lab,
                         (lab_data[i]['min'][0],
                          lab_data[i]['min'][1],
                          lab_data[i]['min'][2]),
                         (lab_data[i]['max'][0],
                          lab_data[i]['max'][1],
                          lab_data[i]['max'][2]))  #对原图像和掩模进行位运算(perform bitwise operation to original image and mask)

The first parameter “frame_lab” is inputting image.

The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.

The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.

(6) Corrosion and inflation

211
212
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))  #腐蚀(corrosion)
dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation)

eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.

The first parameter “frame_mask” is the binary image on which morphological operations are to be performed.

The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.

The dilation function follows the same principle.

(7) Get the contour with the largest area

After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using the findContours() function from the cv2 library.

215
216
contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2]  #找出轮廓(find out contours)
areaMaxContour, area_max = getAreaMaxContour(contours)  #找出最大轮廓(find out the contour with the maximal area)

Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:

The first parameter “dilated” is inputting image.

The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.

The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.

Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# 找出面积最大的轮廓(find the contour with the maximal area)
# 参数为要比较的轮廓的列表(parameter is the list of contour to be compared)
def getAreaMaxContour(contours):
    contour_area_temp = 0
    contour_area_max = 0
    areaMaxContour = None

    for c in contours:  # 历遍所有轮廓(iterate through all contours)
        contour_area_temp = math.fabs(cv2.contourArea(c))  # 计算轮廓面积(calculate contour area)
        if contour_area_temp > contour_area_max:
            contour_area_max = contour_area_temp
            if contour_area_temp > 50:  # 只有在面积大于50时,最大面积的轮廓才是有效的,以过滤干扰(only contours with an area greater than 50 are considered valid; the contour with the largest area is used to filter out interference)
                areaMaxContour = c

    return areaMaxContour, contour_area_max  # 返回最大的轮廓(return the contour with the maximal area)

(8) Determine the largest color block

Determine the color of the largest area contour and add the result to the color_list.

229
230
231
232
233
234
235
236
237
if color_area_max == 'red':  #红色最大(red is the maximal area)
    color = 1
elif color_area_max == 'green':  #绿色最大(green is the maximal area)
    color = 2
elif color_area_max == 'blue':  #蓝色最大(blue is the maximal area)
    color = 3
else:
    color = 0
color_list.append(color)

(9) Multiple judgments

Take the average by multiple judgments, and determine the recognized color.

239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
if len(color_list) == 3:  #多次判断(multiple judgement)
    # 取平均值(take average value)
    color = int(round(np.mean(np.array(color_list))))
    color_list = []
    if color == 1:
        detect_color = 'red'
        draw_color = range_rgb["red"]
    elif color == 2:
        detect_color = 'green'
        draw_color = range_rgb["green"]
    elif color == 3:
        detect_color = 'blue'
        draw_color = range_rgb["blue"]
    else:
        detect_color = 'None'
        draw_color = range_rgb["black"]

(10) Print recognized outcome

Use the cv2.putText() function from the cv2 library to draw text on the image.

259
cv2.putText(img, "Color: " + detect_color, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2)

Take code cv2.putText(img, "Color: " + detect_color, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2) as example:

The first parameter “img” is the image being drawn.

The second parameter 'Color: ' + detect_color is the information drawn on the image.

The third parameter (10, img.shape[0] - 10) is the starting coordinate of the text, i.e., the position of the bottom-left corner of the text. Here, the text is 10 pixels away from the left and bottom edges of the image, respectively.

The fourth parameter cv2.FONT_HERSHEY_SIMPLEX is the font type.

The fifth parameter “0.65” is the size scaling factor for the text.

The sixth parameter “draw_color” is the color of the text.

The seventh parameter “2” is the thickness of the text.

(11) Color recognition

① After recognizing the red ball, control robot servo 1 to make the robot nod twice continuously, then return to the neutral position as pictured:

136
137
138
139
140
141
142
143
144
145
146
if detect_color == 'red':
    ctl.set_pwm_servo_pulse(1, 1800, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(1, 1200, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(1, 1800, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(1, 1200, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(1, 1500, 100)
    time.sleep(0.1)

Take code “ctl.set_pwm_servo_pulse(1, 1800, 200)” as example:

The first parameter “1” indicates the servo ID being controlled.

The second parameter “1800” represents the pulse width for servo ID 1. 1500 controls the servo to return to the neutral position.

The third parameter “200” represents the servo’s movement time, which is 200 milliseconds.

② After recognizing the green or blue ball, control robot servo 2 to make the robot shake its head twice continuously, then return to the neutral position, as shown in the following figure.

152
153
154
155
156
157
158
159
160
161
162
elif detect_color == 'green' or detect_color == 'blue':
    ctl.set_pwm_servo_pulse(2, 1800, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(2, 1200, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(2, 1800, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(2, 1200, 200)
    time.sleep(0.2)
    ctl.set_pwm_servo_pulse(2, 1500, 100)
    time.sleep(0.1)

5.2.5 Function Extension

  • Modify Default Recognition Color

Red, green and blue are the built-in colors in the color recognition program and the red is the default color. Then the robot will perform “nod”.

In the following steps, we’re going to modify the recognized color as green.

(1) Enter the following command to the directory where the game program is located.

cd TonyPi/Functions

(2) Enter command to go into the game program through vim editor.

vim ColorDetect.py

(3) Find codes if detect_color == 'red': and elif detect_color == 'green' or detect_color == 'blue':.

Note

After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.

(4) Press “i” to enter the editing mode, then modify red in if detect_color == 'red' to green. And modify red in line 152 elif detect_color== 'green' or detect_color == 'blue' to green. If you want to recognize blue, please revise to “blue”.

(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.

  • Add Recognized Color

In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:

(1) Open VNC, and run the following command to navigate to the directory where the program file is stored.

cd TonyPi

(2) Input the following command to open Lab color setting document.

vim lab_config.yaml

It is recommended to use screenshot to record the initial value.

(3) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.

(4) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.

(5) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black. Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other colors become black.

For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the orange part of the left screen becomes white and other colors become black, and then click “ Save” button to keep the modified data.

(6) After the modification is completed, check whether the modified data was successfully written in. Enter the following command again to check the color setting parameters.

vim TonyPi/lab_config.yaml

For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.

(7) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.

(8) Start the game again and put the orange ball in front of the camera. TonyPi will perform “nod”.

(9) If you want to add other colors as recognized color, please operate as the above steps.

5.3 Color Position Recognition

5.3.1 Feature Overview

In this lesson, the system uses a camera to recognize red, green, and blue balls. The detected objects are highlighted in the video feed, with their X and Y coordinates displayed in real time.

The Color Position Recognition feature consists of two key components: color detection and position marking.

  • Color Detection

To begin, Gaussian filtering is applied to reduce image noise and enhance clarity. The image is then converted to the Lab color space, which enables more accurate and consistent color recognition.

Next, using predefined color thresholds, the system identifies the color of the object within the target area. A mask is applied to isolate the relevant regions of the image—masking allows for global or local filtering based on the defined object.

Morphological operations, including opening and closing, are performed to refine the detection:

  • Opening (erosion followed by dilation): Helps eliminate small noise, smooth the contours of shapes, and separate connected objects without changing their size.

  • Closing (dilation followed by erosion): Fills small holes within objects, connects nearby elements, and smooths contours while preserving object integrity.

Finally, the object with the largest contour is identified and outlined with a circle.

  • Position Marking

To determine the position of the detected object, a dedicated detection algorithm is used. This algorithm locates areas in the image that match predefined features or patterns and returns their coordinates along with bounding boxes.

5.3.2 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open the command-line terminal.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 Color_Recognize.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.3.3 Project Outcome

By default, the program is configured to detect red, green, and blue balls. Once recognized, the detected object will be circled in the video feed, and its X and Y coordinates will be displayed in the bottom-left corner of the screen.

Important Notes:

Note

  • Ensure sufficient lighting during operation to improve recognition accuracy. Inadequate lighting may result in detection errors.

  • Avoid placing objects with colors similar to the target colors in the background within the camera’s field of view, as this may cause false detections.

  • If color detection is inaccurate, please refer to 5.3.5 Function Extension -> Color Threshold Adjustment for instructions on adjusting the color thresholds.

5.3.4 Program Analysis

Source Code

The source code of this program is locate in /home/pi/TonyPi/Functions/Color_Recognition.py

  • Import Function Library

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import sys
import os
import cv2
import math
import time
import numpy as np

import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.yaml_handle as yaml_handle
Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities.
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import numpy as np Imports the NumPy library, an open-source numerical computing library in Python used for array and matrix operations.
import hiwonder.Misc as misc Imports the Misc module, used for processing detected rectangle data.
import hiwonder.Camera as Camera Imports the camera module.
import hiwonder.yaml_handle as yaml_handle Imports functions and tools for handling YAML format files.

(1) To begin, import the necessary libraries such as OpenCV, time, math, threading, etc. If you want to use a function from a specific library, you can call it using the format: LibraryName.FunctionName(parameters)

For example:

12
13
14
15
16
range_rgb = {
    'red': (0, 0, 255),
    'green': (0, 255, 0),
    'blue': (255, 0, 0),   # 注意:OpenCV中BGR,蓝色通道在开头
}

This calls the sleep() function from the time library, which is used to create a delay.

Python provides a number of built-in libraries like time, cv2, and math, which can be imported and used directly. You can also create and use your own custom libraries, such as yaml_handle for reading YAML files.

(2) Instantiating Libraries for Simpler Use

Some library names can be long and difficult to remember. To simplify function calls, it’s common to instantiate the library using an alias. For example:

31
32
33
def load_config():
    global lab_data
    lab_data = yaml_handle.get_yaml_data(yaml_handle.lab_file_path)

After instantiation, you can call functions from the Misc module using a shorter, more convenient format such as: Misc.function_name(parameters)

This makes the code cleaner and easier to write.

  • Main Function Analysis

In a Python program, the condition __name__ == '__main__' indicates the main entry point of the program.

The program starts by initializing the camera and reading the video stream. The read() method is used to capture each frame from the video. It then processes each frame to detect and highlight the colored balls, displaying the results in real time.

By continuously reading and displaying frames in a loop, the program achieves live video playback. Once the video processing is complete, the release() function is called to free up the camera and related system resources.

 97
 98
 99
100
101
102
103
104
105
106
107
108
while True:
    ret, img = my_camera.read()
    if img is not None:
        frame = img.copy()
        frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR)
        Frame = run(frame)
        cv2.imshow('Frame', Frame)
        key = cv2.waitKey(1)
        if key == 27:
            break
    else:
        time.sleep(0.01)

(1) Read Live Camera Feed

98
ret, img = my_camera.read()

After the game starts, the program will read the live camera feed first.

(2) Process Image

Invoke the “run()” function for image processing.

102
Frame = run(frame)

(3) Image Resizing

40
frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST)

First parameter: img_copy – the input image.

Second parameter: size – the desired output size of the image, which can be customized as needed.

Third parameter: interpolation=cv2.INTER_NEAREST – the interpolation method used during resizing. Common interpolation options include:

  • INTER_NEAREST: Nearest-neighbor interpolation

  • INTER_LINEAR: Bilinear interpolation (used by default if no method is specified)

  • INTER_CUBIC: Bicubic interpolation using a 4×4 pixel neighborhood

  • INTER_LANCZOS4: Lanczos interpolation using an 8×8 pixel neighborhood

(4) Gaussian Blur

Images often contain noise that can reduce visual quality and obscure important features. Depending on the type of noise, different filtering methods can be applied, such as Gaussian blur, median blur, or mean blur.

Gaussian blur is a linear smoothing filter used to reduce Gaussian noise. It is widely applied in image preprocessing for noise reduction.

41
frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)

First parameter: frame_resize – the input image.

Second parameter: (3, 3) – the size of the Gaussian kernel.

Third parameter: 3 – the standard deviation of the Gaussian kernel in the X direction.

(5) Color space conversion

The function cv2.cvtColor() is used to convert an image from one color space to another.

42
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)

First parameter: frame_gb – the input image.

Second parameter: cv2.COLOR_BGR2LAB – the conversion code. In this case, it converts the image from BGR color space to LAB color space. If you need to convert to RGB instead, you can use cv2.COLOR_BGR2RGB.

(6) To simplify image processing and reduce data volume, the image is converted into a binary image—consisting only of 0s and 1s. This makes further processing more efficient. The cv2.inRange() function from OpenCV is used for this binarization process.

55
frame_mask = cv2.inRange(frame_lab, min_lab, max_lab)

First parameter: frame_lab – the input image in LAB color space.

Second parameter: min_lab – the lower threshold for the LAB color range.

Third parameter: max_lab – the upper threshold for the LAB color range.

To reduce noise and make the binary image smoother, morphological operations such as erosion and dilation are applied:

  • Erosion: Shrinks the foreground objects in the image, helping to remove small noise or separate closely connected objects.

  • Dilation: Expands the foreground objects, useful for filling small holes and gaps within objects or reconnecting broken parts.

These operations are essential steps in refining the results of image segmentation.

56
57
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))
dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))

(7) Extracting the Largest Contour

After completing the image processing steps, you need to obtain the contours of the detected targets. This is done using OpenCV’s cv2.findContours() function.

58
contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2]

First parameter: dilated — the input image (typically after dilation).

Second parameter: cv2.RETR_EXTERNAL — contour retrieval mode, which retrieves only the external contours.

Third parameter: cv2.CHAIN_APPROX_NONE)[-2] — contour approximation method, which stores all the contour points.

From the detected contours, the one with the largest area is selected. To avoid interference from noise or small objects, a threshold area value is set, and only contours with an area larger than this threshold are considered valid targets.

60
61
62
63
64
65
areaMaxContour, area_max = getAreaMaxContour(contours)
if areaMaxContour is not None and area_max > 200:
    ((centerX, centerY), radius) = cv2.minEnclosingCircle(areaMaxContour)
    centerX = int(Misc.map(centerX, 0, size[0], 0, img_w))
    centerY = int(Misc.map(centerY, 0, size[1], 0, img_h))
    radius = int(Misc.map(radius, 0, size[0], 0, img_w))

(8) Obtaining Position Information

To display text information on the image, the OpenCV function cv2.putText() is used.

70
71
cv2.putText(img, "Color: " + color, (centerX - 30, centerY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, draw_color, 2)
cv2.putText(img, f"Pos:({centerX},{centerY})", (centerX - 30, centerY + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, draw_color, 2)

First parameter: img — the input image.

Second parameter: "Color: " + color — the text content to be drawn.

Third parameter: (centerX - 30, centerY - 10) — the starting coordinates of the text on the image, representing the bottom-left corner of the text (x, y).

Fourth parameter: cv2.FONT_HERSHEY_SIMPLEX — specifies the font type (simple font).

Fifth parameter: 0.5 — font scale factor, reducing the default font size to 50%.

Sixth parameter: draw_color — the color of the text.

Seventh parameter: 2 — thickness of the text stroke.

(9) Display the Live Camera Feed

103
104
cv2.imshow('Frame', Frame)
key = cv2.waitKey(1)

The function cv2.imshow() is used to display an image in a window. Here, “Frame” is the window name, and frame is the image content to be shown. It is essential to follow this with cv2.waitKey(), otherwise the image will not be displayed.

The function cv2.waitKey() waits for a key press, where the parameter 1 specifies the delay time in milliseconds.

5.3.5 Function Extension

  • Adjusting Color Thresholds

If the color recognition performance is poor during the game experience, it may be necessary to adjust the color threshold. This section uses red as an example, and the same method can be applied to adjust other colors. Follow the steps below:

(1) Double-click , and in the popup interface, click “Execute”.

(2) Once in the interface, click “Connect” to link the camera.

(3) After a successful connection, select “red” from the color options in the lower-right corner of the interface.

(4) If the transmitted image does not appear in the popup window, the camera may not have connected successfully. Check that the camera’s connection cable is properly plugged in.

In the interface shown below, the right side displays the real-time transmitted image, while the left side shows the color to be detected. Point the camera at the red ball, then adjust the six sliders at the bottom so that the red ball area on the left turns entirely white, and the other areas turn black. Afterward, click the “Save” button to save the settings.

  • Changing the Default Recognized Color

In addition to the three built-in recognizable colors, we can also add other colors for recognition. For example, using purple as a new recognizable color, the specific modification steps are as follows:

(1) Double-click and choose ‘Execute’ in the prompt box.

(2) In the pop-up interface, select “Connect” step by step.

(3) Click “Add,” then name the new color (using “purple” as an example here), and click “OK”.

(4) Next, click the dropdown arrow in the color selection box and choose “purple”.

(5) Point the camera at a purple object, then adjust the L, A, and B sliders until the target color area in the left preview turns white while the other areas turn black.

(6) Finally, click “Save” to save the adjusted color threshold values.

(7) After making changes, verify whether the new values have been saved by entering the following command and pressing Enter to navigate to the program directory:

cd TonyPi

(8) Then enter the command below and press Enter to open the configuration file:

vim lab_config.yaml

(9) After opening the color threshold configuration file, you can view the purple color threshold parameters.

(10) Type “:q” and press Enter to exit the file.

(11) Enter the command below to navigate to the gameplay directory:

cd Functions

(12) Then enter the following command to open the program file and press Enter:

vim Color_Recognition.py

(13) Locate the code section as shown in the reference image.

(14) Press the “i” key to enter edit mode.

(15) Manually add the following line:

Here, (255, 255, 114) corresponds to the max purple threshold value checked in step 9.

'purple':(255, 255, 114),

(16) Save your changes by pressing the “Esc” key, then type :wq (note the colon before wq) and press Enter to save and exit.

(17) If you want to add other colors as recognizable colors, you can follow the same steps above.

5.4 Color Tracking

5.4.1 Program Logic

The robot can recognize colors and move according to the movement of the target color.

First, the system identifies the color using the Lab color space. The RGB color space is converted to Lab, followed by binarization. After applying dilation and erosion operations, the image retains only the contour of the target color, which is then outlined with a circle to complete the color recognition process.

Next, a traversal algorithm compares all correctly identified color objects and selects the one with the largest contour area as the target.

Finally, the gimbal is activated to perform real-time tracking, completing the color recognition and following functionality.

5.4.2 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection -> 3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 ColorTrack.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.4.3 Project Outcome

Once the program is running, hold a red foam block or another colored object. When the video feed appears with the detection frame, use the left mouse button to click on the desired color block. The system will automatically highlight the selected color. Place the object on a movable platform and move it slowly — the TonyPi robot will track and follow the movement of the selected color.

5.4.4 Program Analysis

Source Code

The source code of this program is locate in “/home/pi/TonyPi/Functions/ColorTrack.py”.

  • Import Parameter Module

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import sys
import os
import cv2
import math
import time
import threading
import numpy as np

import hiwonder.PID as PID
import hiwonder.Misc as Misc
import hiwonder.Camera as Camera
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle
from hiwonder.common import ColorPicker
Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import math The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.
import threading Provides an environment for running multiple threads concurrently.
import np The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.
import sensor.camera as camera Import camera library
from common import misc The "Misc" module has been imported for handling recognized rectangular data.
import common.ros_robot_controller_sdk as rrc The robot's underlying control library has been imported for controlling servos, motors, RGB lights, and other hardware.
import common.yaml_handle Contains functionalities or tools related to processing YAML format files.
from common.controller import Controller Import action group execution library
  • Contour Processing

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 找出面积最大的轮廓(find the contour with the maximal area)
# 参数为要比较的轮廓的列表(parameter is the list of contour to be compared)
def getAreaMaxContour(contours):
    contour_area_temp = 0
    contour_area_max = 0
    areaMaxContour = None

    for c in contours:  # 历遍所有轮廓(iterate through all the contours)
        contour_area_temp = math.fabs(cv2.contourArea(c))  # 计算轮廓面积(calculate contour area)
        if contour_area_temp > contour_area_max:
            contour_area_max = contour_area_temp
            if contour_area_temp > 10:  # 只有在面积大于300时,最大面积的轮廓才是有效的,以过滤干扰(only contours with an area greater than 300 are considered valid; the contour with the largest area is used to filter out interference)
                areaMaxContour = c

    return areaMaxContour, contour_area_max  # 返回最大的轮廓(return the contour with the maximal area)

The getAreaMaxContour function is defined to iterate through the list of contours, calculate the area of each, and return the one with the largest area greater than 10.

  • Image Frame Processing

148
149
150
151
152
153
154
155
156
157
def run(img):
    global x_dis, y_dis, target_color
    global img_w, img_h
    global color_picker

    display_image = img.copy()
    img_h, img_w = img.shape[:2]
    
    if not enter:
        return display_image

The run function is defined to take an image (img) as input. It first performs operations such as image duplication and dimension retrieval. Under certain conditions, the image is processed using color_picker. If a target color is specified, the image undergoes resizing, Gaussian blurring, and conversion to the LAB color space. After determining the color range, bitwise operations and morphological processing (erosion and dilation) are applied to extract contours. The largest contour is then identified, and the minimum enclosing circle is calculated if a valid contour is found.

  • Robot Movement Logic

194
195
196
197
198
199
200
201
202
203
204
if abs(centerX - img_w/2.0) < 20: # 移动幅度比较小,则不需要动(if the movement amplitude is small, then no need to move)
    centerX = img_w/2.0

x_pid.SetPoint = img_w/2 #设定(set)
x_pid.update(centerX) #当前(current)
dx = int(x_pid.output)
use_time = abs(dx*0.00025)
x_dis += dx #输出(output)

x_dis = 500 if x_dis < 500 else x_dis
x_dis = 2500 if x_dis > 2500 else x_dis

Based on the relationship between the target’s center position (centerX, centerY) and the image dimensions (img_w, img_h), the function determines the offset of the target within the image. If the offset is within a defined threshold, no movement is required. Otherwise, a PID controller is used to calculate and update the displacement values x_dis and y_dis. Finally, the processed display_image, along with x_dis and y_dis, are returned.

5.5 Auto Shooting

5.5.1 Program Logic

Note

please use the assorted balls for operation. If you have your own balls, we recommend using one with a diameter of 3cm.

Place the red ball in the area recognized by the robot’s camera. The robot will adjust its position according to the ball’s location, and then kick the ball away.

Below are the details:

First, program TonyPi to recognize colors with Lab color space.

Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics, or objects to globally or locally obscure parts of the processed image.

After the opening and closing operations on the object image, the largest object contour is circled.

Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.

Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.

Then, judge whether the object is in the central position after receiving the image feedback. If yes, call TonyPi to move forward to the target until it reaches the set range, and then execute the shooting action; otherwise, the robot will move left or right to the center of the target first.

5.5.2 Operation Steps

Note

Command input must strictly distinguish between uppercase and lowercase letters and spaces.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open the command-line terminal.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 KickBall.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.5.3 Project Outcome

Note

Please use the robot and ball on the flat surface.

Place the red ball in front of TonyPi, then click it with the mouse for automatic color sampling and recognition. Once the ball is recognized, the robot moves into position, approaches it, and kicks it forward.

5.5.4 Program Analysis

Source Code

The source code of this program is locate in /home/pi/TonyPi/Functions/KickBall.py.

  • Import Parameter Module

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import sys
import os
import cv2
import time
import math
import threading
import numpy as np

import hiwonder.PID as PID
import hiwonder.Misc as Misc
import hiwonder.Camera as Camera
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle
from hiwonder.common import ColorPicker
Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities.
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import math The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.
import threading Provides an environment for running multiple threads concurrently.
import np The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.
import sensor.camera as camera Import camera library
from common import misc The "Misc" module has been imported for handling recognized rectangular data.
from common.pid import pid Import PID control library
import common.ros_robot_controller_sdk as rrc The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware.
import common.yaml_handle Contains functionalities or tools related to processing YAML format files.
import common.action_group_control as agc Import action group execution library
from common.controller import Controller Import motion control library
import common.calibration as calibration Import camera calibration library
  • Set initial state

Set initial state, including the initial position of servo, PID, color threshold value, etc.

77
78
79
80
81
# 加载配置文件数据(load configuration file data)
def load_config():
    global servo_data
    
    servo_data = yaml_handle.get_yaml_data(yaml_handle.servo_file_path)
85
86
87
88
# 初始化机器人舵机初始位置(initialize the servo initialization position of robot)
def initMove():
    ctl.set_pwm_servo_pulse(1, servo_data['servo1'], 500)
    ctl.set_pwm_servo_pulse(2, servo_data['servo2'], 500)
  • Image pre-processing

Resizing and Gaussian blur processing of the image.

391
392
393
394
395
396
# 重新调整图像大小(resize the image)
frame_resize = cv2.resize(img, size, interpolation=cv2.INTER_NEAREST)
# 高斯模糊(Gaussian blur)
frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)
# 将图像转换到LAB色彩空间(convert the image to LAB color space)
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)

cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.

The first parameter “img_copy” is the image to be resized.

The second parameter “size” is the target size.

The third parameter “interpolation” is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.

cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.

The first parameter “frame_resize” is the image to be blurred.

The second parameter “(3, 3)” is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.

The third parameter “3” is the standard deviation of the Gaussian kernel, used to control the degree of blur.

  • Color space conversion

Convert the BGR image to LAB image.

395
396
# 将图像转换到LAB色彩空间(convert the image to LAB color space)
frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB)
  • Binarization processing

Use inRange() function in cv2 library to process binarization.

404
405
#对原图像和掩模进行位运算(perform bitwise operation to the original image and mask)
frame_mask = cv2.inRange(frame_lab, tuple(min_color), tuple(max_color))

The first parameter “frame_lab” is inputting image.

The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.

The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.

  • Corrosion and inflation

406
407
408
409
#腐蚀(corrosion)
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))
#膨胀(dilation)
dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))

eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.

The first parameter “frame_mask” is the binary image on which morphological operations are to be performed.

The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.

The dilation function follows the same principle.

  • Get the contour with the largest area

After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using the findContours() function from the cv2 library.

413
414
415
contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2]  # 找出轮廓(find out contour)
# 找出设定范围内的最大轮廓,返回轮廓和轮廓的面积(find the largest contour within the specified range and return the contour and its area)
areaMaxContour, area_max = getAreaMaxContour(contours)

Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:

The first parameter “dilated” is inputting image.

The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.

The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.

Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.

55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# 找出面积最大的轮廓(find out the contour with the maximal area)
# 参数为要比较的轮廓的列表(parameter is the list of contour to be compared)
def getAreaMaxContour(contours):
    contour_area_temp = 0
    contour_area_max = 0
    areaMaxContour = None

    for c in contours:  # 历遍所有轮廓(iterate through all contours)
        contour_area_temp = math.fabs(cv2.contourArea(c))  # 计算轮廓面积(calculate contour area)
        if contour_area_temp > contour_area_max:
            contour_area_max = contour_area_temp
            if 640*480/100 > contour_area_temp > 2:  # 只有在面积大于300时,最大面积的轮廓才是有效的,以过滤干扰(only contours with an area greater than 300 are considered valid; the contour with the largest area is used to filter out interference)
                areaMaxContour = c

    return areaMaxContour, contour_area_max  # 返回最大的轮廓(return the contour with the maximal area)
  • Get color block center point coordinates

Using the misc function, map the x and y coordinates of the object center and the radius from the original size range to the range of the new image size (img_w and img_h). And use the cv2.circle function to identify the color block by circling it.

423
424
425
426
# 将球的中心坐标和半径映射回原始图像尺寸(map the center coordinates and radius of the ball back to the original image size)
CenterX = int(Misc.map(CenterX, 0, size[0], 0, img_w))
CenterY = int(Misc.map(CenterY, 0, size[1], 0, img_h))
radius = int(Misc.map(radius, 0, size[0], 0, img_w))
  • Auto shooting

(1) If a ball is detected, the program will initialize sub-steps and step sizes, and set the timer start flag. If the ball is not in the center of the frame, the robot’s orientation will be adjusted based on the ball’s position, and the corresponding turning action will be executed until the ball is in the center of the frame.

228
229
230
231
232
233
234
235
if step == 1:
    # 球不在画面中心,则根据方向让机器人转向一步,直到满足条件进入步骤2(if the ball is not in the center of the frame, instruct the robot to turn one step in the direction until the condition is met to enter step 2)
    if x_dis - servo_data['servo2'] > 150:
        AGC.runActionGroup('turn_left_small_step')
    elif x_dis - servo_data['servo2'] < -150:
        AGC.runActionGroup('turn_right_small_step')
    else:
        step = 2

(2) If the vertical servo position equals the set position, adjust the robot’s movement based on the current horizontal servo position. If the horizontal servo position is 400 units to the left or right of the set position, execute the corresponding turning action. If the ball is above the center of the frame, move forward one step. If the ball is below the center of the frame, move forward. If the ball is below the center of the frame and the horizontal servo position differs from the set position by no more than 200 units, move forward quickly; otherwise, execute the third step action.

237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
elif step == 2:
    # 当控制头部垂直运动的舵机位置等于设定的位置(when the position of the servo controlling vertical head movement equals the set position)
    if y_dis == servo_data['servo1']:
        # 根据当前水平舵机位置调整机器人运动(adjust the robot's movement based on the current horizontal servo position)
        if x_dis == servo_data['servo2'] - 400:
            AGC.runActionGroup('turn_right',2)
        elif x_dis == servo_data['servo2'] + 400:
            AGC.runActionGroup('turn_left',2)
        elif 350 < CenterY <= 380:    # ball_center_y值越大,与球的距离越近(the larger the value of ball_center_y, the closer the distance to the ball)
            AGC.runActionGroup('go_forward_one_step')
            last_status = 'go'        # 记录上一步的状态是往前走(record the previous step state as walking forward)
            step = 1
        elif 120 < CenterY <= 350:
            AGC.runActionGroup('go_forward')
            last_status = 'go'
            step = 1
        elif 0 <= CenterY <= 120 and abs(x_dis - servo_data['servo2']) <= 200:
            AGC.runActionGroup('go_forward_fast')
            last_status = 'go'
            step = 1
        else:
            step = 3
    else:
        # 当控制头部垂直运动的舵机位置不等于设定的位置,机器人调整位置往前走,直到两个位置相等(when the position of the servo controlling vertical head movement is not equal to the set position, the robot adjusts its position to move forward until the two positions are equal)
        if x_dis == servo_data['servo2'] - 400:
            AGC.runActionGroup('turn_right',2)
        elif x_dis == servo_data['servo2'] + 400:
            AGC.runActionGroup('turn_left',2)
        else:
            AGC.runActionGroup('go_forward_fast')
            last_status = 'go'

(3) In step three, if the vertical servo position equals the set position, adjust the robot’s position based on the horizontal position of the ball in the frame. If the horizontal position of the ball deviates from the center of the frame by less than or equal to 40 units, move left. If the horizontal position of the ball is to the left of the center of the frame and the deviation is greater than 40 units, move quickly to the left. If the horizontal position of the ball is to the right of the center of the frame and the deviation is greater than 40 units, move quickly to the right; otherwise, execute the fourth step action.

If the vertical servo position is not equal to the set position, adjust based on the difference between the horizontal servo position and the set position: If the difference is between 270 and 480, move quickly to the left. If the difference is less than 170, move left. If the difference is between -480 and -270, move quickly to the right; otherwise, execute the fourth step action.

269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
elif step == 3:
    if y_dis == servo_data['servo1']:
        # 根据球在画面的x坐标左右平移调整位置(adjust the position based on the left-right movement of the ball's x-coordinate in the frame)
        if abs(CenterX - CENTER_X) <= 40:
            AGC.runActionGroup('left_move')
        elif 0 < CenterX < CENTER_X - 50 - 40:
            AGC.runActionGroup('left_move_fast')
            time.sleep(0.2)
        elif CENTER_X + 50 + 40 < CenterX:
            AGC.runActionGroup('right_move_fast')
            time.sleep(0.2)
        else:
            step = 4
    else:
        if 270 <= x_dis - servo_data['servo2'] < 480:
            AGC.runActionGroup('left_move_fast')
            time.sleep(0.2)
        elif abs(x_dis - servo_data['servo2']) < 170:
            AGC.runActionGroup('left_move')
        elif -480 < x_dis - servo_data['servo2'] <= -270:
            AGC.runActionGroup('right_move_fast')
            time.sleep(0.2)
        else:
            step = 4

(4) In step four, if the vertical servo position equals the set position, perform the following operations: If the vertical position of the ball is between 380 and 440, move forward one small step. If the vertical position of the ball is between 0 and 380, move forward; otherwise, based on the horizontal position of the ball, determine which foot to use for the shooting action. If the horizontal position of the ball is to the left of the center of the frame, use the left foot for a quick shot; otherwise, use the right foot for a quick shot and reset the main step to 1. If the vertical servo position is not equal to the set position, reset the main step to 1.

293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
elif step == 4:
    if y_dis == servo_data['servo1']:
        # 小步伐靠近到合适的距离(take small steps to approach at the appropriate distance)
        if 380 < CenterY <= 440:
            AGC.runActionGroup('go_forward_one_step')
            last_status = 'go'
        elif 0 <= CenterY <= 380:
            AGC.runActionGroup('go_forward')
            last_status = 'go'
        else:   # 根据最后球的x坐标,采用离得近的脚去踢球(use closest foot to kick the ball based on the final x-coordinates of the ball)
            AGC.runActionGroup('go_forward_one_step')
            if CenterX < CENTER_X:
                AGC.runActionGroup('left_shot_fast')
            else:
                AGC.runActionGroup('right_shot_fast')
            step = 1
    else:
        step = 1

(5) If the ball is not detected, check if the robot’s previous state was “moving forward”. If it was, then quickly step back one step. If the timer has already started, reset the timer flag to False and record the current time as the start time for the timer. Otherwise, if the time since the last start of timing exceeds 0.5 seconds, perform the following operations based on the sub-step:

If the sub-step is 5, move the horizontal servo position. If the deviation between the horizontal servo position and the set position is less than or equal to the absolute value of the horizontal step size, perform the action to turn right, and reset the sub-step to 1.

If the sub-step is 1 or 3, move the horizontal servo position. If the horizontal servo position exceeds the set position plus 400, reset the sub-step to 2, and invert the horizontal step size. If the horizontal servo position is less than the set position minus 400, reset the sub-step to 4, and invert the horizontal step size.

If the sub-step is 2 or 4, move the vertical servo position. If the vertical servo position exceeds 1200, reset the sub-step to 3, and invert the vertical step size. If the vertical servo position is less than the set position, reset the sub-step to 5, and invert the vertical step size. Finally, set the servo pulse width to the vertical servo position and horizontal servo position, then sleep for 0.02 seconds.

312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
elif CenterX == -1:   # 如果没检测到球(if no ball is detected)
    # 如果机器人上次状态为"前进",快速后退一步(if the robot's previous state was 'forward,' quickly take one step backward)
    if last_status == 'go':
        last_status = ''
        AGC.runActionGroup('back_fast', with_stand=True)
    elif start_count:  # 开始计时的标志变量为True(the flag variable for starting the timer is set to True)
        start_count= False
        t1 = time.time()    # 记录当前的时间,开始计时(record the current time and start the timer)
    else:
        if time.time() - t1 > 0.5:
            
            if step_ == 5:
                x_dis += d_x
                if abs(x_dis - servo_data['servo2']) <= abs(d_x):
                    AGC.runActionGroup('turn_right')
                    step_ = 1
            if step_ == 1 or step_ == 3:
                x_dis += d_x
                if x_dis > servo_data['servo2'] + 400:
                    if step_ == 1:
                        step_ = 2
                    d_x = -d_x
                elif x_dis < servo_data['servo2'] - 400:
                    if step_ == 3:
                        step_ = 4
                    d_x = -d_x
            elif step_ == 2 or step_ == 4:
                y_dis += d_y
                if y_dis > 1200:
                    if step_ == 2:
                        step_ = 3
                    d_y = -d_y
                elif y_dis < servo_data['servo1']:
                    if step_ == 4:
                        step_ = 5
                    d_y = -d_y
            ctl.set_pwm_servo_pulse(1, y_dis, 20)
            ctl.set_pwm_servo_pulse(2, x_dis, 20)
            
            time.sleep(0.02)

5.6 Line Following

5.6.1 Program Logic

Note

Demonstration video is in the current section folder.

Line tracking is common in robot competitions which is implemented by two-channel or four-channel line-tracking sensors.However, TonyPi only need the vision module to recognize the line color, process by image algorithms, to realize the line follow.

First, program TonyPi to recognize colors with Lab color space.

Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics, or objects to globally or locally obscure parts of the processed image.

After processing the corrosion and inflation of the object image, the largest object contour is circled.

Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.

Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.

Thirdly, after recognition, process the servo part with x and y coordinates of the center point of the image as the set values. Input the current acquired x and y coordinates to update the pid.

Fourthly, calculate according to the feedback of the line position in the image, and program the robot to follow the line to achieve the function of intelligent line tracking.

5.6.2 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 VisualPatrol.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.6.3 Project Outcome

You can lay red lines using electrical tape, then place the robot on top of the red line. By clicking on the red line with the mouse, the system will automatically pick the color. The robot will then follow the red path and move forward along its direction.

5.6.4 Programming Instruction

Source Code

The source code of this program is locate in “/home/pi/TonyPi/Functions/VisualPatrol.py”.

  • Import Parameter Module

Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import math The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.
import threading Provides an environment for running multiple threads concurrently.
import np The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.
import sensor.camera as camera Import camera library
from common import misc The "Misc" module has been imported for handling recognized rectangular data.
import common.ros_robot_controller_sdk as rrc The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware.
import common.yaml_handle Contains functionalities or tools related to processing YAML format files.
from common.controller import Controller Import action group execution library
  • Set initial state

Set initial state, including the initial position of servo, roi area, etc.

39
40
41
42
43
# 加载配置文件数据(load configuration file data)
def load_config():
    global servo_data
    
    servo_data = yaml_handle.get_yaml_data(yaml_handle.servo_file_path)
47
48
49
50
# 初始化机器人舵机初始位置(initial servo initialization position of robot)
def initMove():
    ctl.set_pwm_servo_pulse(1, servo_data['servo1'], 500)
    ctl.set_pwm_servo_pulse(2, servo_data['servo2'], 500)
  • Image pre-processing

Resizing and Gaussian blur processing of the image.

183
184
frame_resize = cv2.resize(img, size, interpolation=cv2.INTER_NEAREST)
frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)

cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.

The first parameter “img_copy” is the image to be resized.

The second parameter “size” is the target size.

The third parameter “interpolation” is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.

cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.

The first parameter “frame_resize” is the image to be blurred.

The second parameter “(3, 3)” is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.

The third parameter “3” is the standard deviation of the Gaussian kernel, used to control the degree of blur.

  • Set roi area

From “frame_gb”, crop out the corresponding ROI regions based on each element in the “roi” list and the height values corresponding to “roi_h_list”. Save these regions in the “blobs” variable.

195
blobs = frame_gb[r[0]:r[1], r[2]:r[3]]
  • Color space conversion

Convert the BGR image to LAB image.

196
frame_lab = cv2.cvtColor(blobs, cv2.COLOR_BGR2LAB)  # 将图像转换到LAB空间(convert the image to LAB space)
  • Binarization processing

Use inRange() function in cv2 library to process binarization.

204
205
#对原图像和掩模进行位运算(perform bitwise operation to the original image and mask)
frame_mask = cv2.inRange(frame_lab, tuple(min_color), tuple(max_color))

The first parameter frame_lab is inputting image.

The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.

The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.

  • Corrosion and inflation

206
207
208
209
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))  #腐蚀(corrosion)
dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation)
dilated[:, 0:160] = 0
dilated[:, 480:640] = 0

eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.

The first parameter “frame_mask” is the binary image on which morphological operations are to be performed.

The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.

The dilation function follows the same principle.

dilated[:, 0:160] = 0 set all pixel values in the first 160 columns on the left side of the image (from column 0 to column 159) to 0, i.e., turn them black, to remove the unnecessary parts of the image for recognition.

dilated[:, 480:640] = 0 set all pixel values in the right side from column 480 to column 639 to 0, i.e., turn them black, to remove the unnecessary parts of the image for recognition.

  • Get the contour with the largest area

After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using the findContours() function from the cv2 library.

210
211
cnts = cv2.findContours(dilated , cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1)[-2]#找出所有轮廓(find out all contours)
cnt_large, area = getAreaMaxContour(cnts)#找到最大面积的轮廓(find out the contour with the maximal area)

Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:

The first parameter “dilated” is inputting image.

The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.

The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.

Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.

116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# 找出面积最大的轮廓(find out the contour with the maximal area)
# 参数为要比较的轮廓的列表(parameter is the list of contour to be compared)
def getAreaMaxContour(contours):
    contour_area_temp = 0
    contour_area_max = 0
    area_max_contour = None

    for c in contours:  # 历遍所有轮廓(iterate through all contours)
        contour_area_temp = math.fabs(cv2.contourArea(c))  # 计算轮廓面积(calculate contour area)
        if contour_area_temp > contour_area_max:
            contour_area_max = contour_area_temp
            if contour_area_temp > 5:  # 只有在面积大于300时,最大面积的轮廓才是有效的,以过滤干扰(only contours with an area greater than 300 are considered valid; the contour with the largest area is used to filter out interference)
                area_max_contour = c

    return area_max_contour, contour_area_max  # 返回最大的轮廓(return the contour with the maximal area)
  • Get the center position coordinates of the line

Use the “misc” function to map the x and y coordinates of the object center, as well as the radius, from the original size range to the range of the new image size (“img_w” and “img_h”). Then, use the cv2.circle function to draw a circle around the color block.

212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
if cnt_large is not None:#如果轮廓不为空(if contour is not NONE)
    rect = cv2.minAreaRect(cnt_large)#最小外接矩形(the minimum bounding rectangle)
    box = np.int0(cv2.boxPoints(rect))#最小外接矩形的四个顶点(the four vertices of the minimum bounding rectangle)
    for i in range(4):
        box[i, 1] = box[i, 1] + (n - 1)*roi_h + roi[0][0]
        box[i, 1] = int(Misc.map(box[i, 1], 0, size[1], 0, img_h))
    for i in range(4):
        box[i, 0] = int(Misc.map(box[i, 0], 0, size[0], 0, img_w))
        
    cv2.drawContours(display_image, [box], -1, (0,0,255,255), 2)#画出四个点组成的矩形(draw the rectangle formed by four points)
    
    #获取矩形的对角点(get the diagonal points of the rectangle)
    pt1_x, pt1_y = box[0, 0], box[0, 1]
    pt3_x, pt3_y = box[2, 0], box[2, 1]
    center_x, center_y = (pt1_x + pt3_x) / 2, (pt1_y + pt3_y) / 2#中心点(center point)
    cv2.circle(display_image, (int(center_x), int(center_y)), 5, (0,0,255), -1)#画出中心点(draw center point)
    
    center_.append([center_x, center_y])
    #按权重不同对上中下三个中心点进行求和(sum the three central points of the top, middle, and bottom sections according to different weights)
    centroid_x_sum += center_x * r[4]
    weight_sum += r[4]
  • Intelligent line follow

Based on the calculated difference between the X-coordinate of the line center point and the X-coordinate of the screen center, call different action groups to follow the line.

131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def move():
    global line_center_x
    
    while True:
        if enter:
            if line_center_x != -1:
                if abs(line_center_x - img_centerx) <= 50:
                    AGC.runActionGroup('go_forward')
                elif line_center_x - img_centerx > 50:
                    AGC.runActionGroup('turn_right_small_step')
                elif line_center_x - img_centerx < -50:
                    AGC.runActionGroup('turn_left_small_step')
            else:
                time.sleep(0.01)
        else:
            time.sleep(0.1)

If the differential is less than or equal to ±50: Call the “go_forward” action group.

If the differential is greater than 50: Call the “turn_right_small_step” to perform turning right action group.

If the differentials is greater than -50: Call the “turn_left_small_step” to perform turning left action group.

5.7 Tag Detection

5.7.1 Brief Program Description

In this lesson, we’ll complete a small project that combines OpenCV and AprilTag to detect AprilTag markers. When the camera recognizes a tag, the robot’s onboard buzzer will sound as an alert, and the detection result will be displayed on the feedback screen.

AprilTag is a type of visual marker similar to a QR code or barcode, used for rapid detection and position estimation with real-time performance. It is widely applicable in tasks such as augmented reality (AR), robotics, and camera calibration.

AprilTags can be easily printed using a regular printer, and the detection system can accurately calculate the tag’s 3D position, orientation, and ID relative to the camera.

5.7.2 Start and Close the Game

Note

The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces.

(1) Power on the device and, following the instructions in “3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection”, use the VNC remote connection tool to connect.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open the command-line terminal.

(3) Run the following command to navigate to the directory where the game programs are stored.

cd TonyPi/Functions/

(4) Enter the command and press Enter to start the program:

python3 Tag_Detect.py

(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.

5.7.3 Program Outcome

Note

For optimal tag detection, place the tag against a solid-colored or white background. Dark backgrounds (e.g., black) may interfere with tag recognition.

Once the game is activated, position the included AprilTag tag in front of the camera. When the robot detects the tag, the buzzer will sound as a prompt. The feedback image will display the captured tag, outline it, and show the tag’s tag_id and tag_family information.

5.7.4 Program Parameter Explanation

Source Code

The source code for this program is located at:

/home/pi/TonyPi/Functions/Tag_Detect.py

  • Image Acquisition and Processing

The first step is image processing, which involves working with digital image data. We begin by importing the necessary packages.

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys
import cv2
import math
import time
import numpy as np

import hiwonder.Camera as Camera
import hiwonder.ros_robot_controller_sdk as rrc
import hiwonder.yaml_handle as yaml_handle
import hiwonder.apriltag as apriltag
# 检测apriltag(detect apriltag)

board = rrc.Board()
Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities.
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import numpy as np Imports the NumPy library, an open-source numerical computing library in Python used for array and matrix operations.
import hiwonder.Camera as Camera Imports the camera library.
import hiwonder.ros_robot_controller_sdk as rrc Imports the low-level robot control library used to control servos, motors, RGB lights, and other hardware.
import hiwonder.yaml_handle as yaml_handle Includes functions and tools for handling YAML format files.
import hiwonder.apriltag as apriltag Imports the library for AprilTag detection.

Next, we initialize and start the camera to acquire the image, then proceed to copy, remap, and display the image.

87
88
89
90
91
92
93
94
95
96
97
98
while True:
    ret, img = my_camera.read()
    if img is not None:
        frame = img.copy()
        frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR)  # 畸变矫正(distortion correction)
        Frame = run(frame)
        cv2.imshow('Frame', Frame)
        key = cv2.waitKey(1)
        if key == 27:
            break
    else:
        time.sleep(0.01)

Afterward, we need to convert the image from RGB format to grayscale. The corresponding code is as follows:

19
20
21
def apriltagDetect(img):
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    detections = detector.detect(gray, return_image=False)
  • Tag Detection

Once the image has been processed, we need to detect the tag. This is done by using the tag library to detect the tag in the acquired image. The code implementation is as follows:

18
19
20
21
detector = apriltag.Detector(searchpath=apriltag._get_demo_searchpath())
def apriltagDetect(img):
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    detections = detector.detect(gray, return_image=False)

After detection, the program will obtain the four corner points of the tag.

25
corners = np.int0(detection.corners)  # 获取四个角点(get four corner points)

Next, we need to draw the contours of the tag. In OpenCV, we use the cv2.drawContours function to accomplish this. The program code is as follows:

26
cv2.drawContours(img, [np.array(corners, int)], -1, (0, 255, 255), 2)

This function takes five parameters, each with the following meanings:

  1. img: The image to be processed.

  2. [np.array(corners, np.int)]: The contour points.

  3. -1: The contour index. -1 indicates that all contours should be drawn.

  4. (0, 255, 255): The color of the contour.

  5. 2: The thickness of the contour line.

  • Retrieving Tag Information

The program uses the AprilTag library to perform encoding and decoding to retrieve the tag’s information. Depending on the encoding method, different inner point coordinates are generated.

Once the quadrilateral is identified, the grid coordinates are clarified. To verify the reliability of the encoding, the tag must be matched against a known encoding library.

49
50
51
52
53
54
55
if tag_id is not None:
    cv2.putText(img, "tag_id: " + str(tag_id), (10, img.shape[0] - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 0, 255], 2)
    cv2.putText(img, "tag_family: " + tag_family, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 0, 255], 2)
    if beer == True:
            board.set_buzzer(1900, 0.1, 0.9, 1) # 以1900Hz的频率,持续响0.1秒,关闭0.9秒,重复1次(at a frequency of 1900Hz, sound for 0.1 seconds, then pause for 0.9 seconds, repeat once)
            
    beer = False

5.8 Tag Recognition

5.8.1 Program Description

The robot executes corresponding action groups by recognizing different ID tags.

AprilTag, a visual positioning marker, can quickly detect the marker and calculate the position. It’s mainly applied to AR, robot and camera calibration, etc.

The following is the overall process:

First, detect AprilTag through positioning, image segmentation, and contour search. Then the quadrilateral detection is performed after the contour is positioned. Connect the four corner points with a straight line to form a closed loop.

Encoding and decoding the detected tags. Finally, add the corresponding execution action according to the decoding tags with different IDs.

5.8.2 Start and Close the Game

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) Input command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input command, then press Enter to start the game.

python3 ApriltagDetect.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.8.3 Project Outcome

Note

Please run this game on a solid color or a white background. Dark background such as black will affect the tag recognition performance.

After starting the tag recognition, place the tag cards in front of the camera to recognize in turns. TonyPi will execute the corresponding actions when the tad is recognized.

Tag ID Action
1 Bowing
2 Mark time
3 Dancing

5.8.4 Program Analysis

Source Code

The source code of this program is locate in “/home/pi/TonyPi/Functions/ApriltagDetect.py”.

  • Import Parameter Module

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import sys
import cv2
import math
import time
import threading
import numpy as np

import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle
import hiwonder.apriltag as apriltag
Import module function
import sys The Python "sys" module has been imported for accessing system-related functions and variables.
import os The Python "os" module has been imported, providing functions and methods for interacting with the operating system.
import cv2 The OpenCV library has been imported for image processing and computer vision-related functionalities
import time The Python "time" module has been imported for time-related functionalities, such as delay operations.
import math The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.
import threading Provides an environment for running multiple threads concurrently.
import np The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.
import common.apriltag as apriltag Import apriltag library
import sensor.camera as camera Import camera library
from common import misc The "Misc" module has been imported for handling recognized rectangular data.
import common.ros_robot_controller_sdk as rrc The robot's underlying control library has been imported for controlling servos, motors, RGB lights, and other hardware.
import common.yaml_handle Contains functionalities or tools related to processing YAML format files.
from common.controller import Controller Import action group execution library
  • Set initial state

Set initial state, including the initial position of servo and tag ID.

39
40
41
42
# 初始化机器人舵机初始位置(initialize servo initialization position of robot)
def initMove():
    ctl.set_pwm_servo_pulse(1, 1500, 500)
    ctl.set_pwm_servo_pulse(2, 1500, 500)
  • Create AprilTag detector

Detect visual markers using the default marker patterns provided by the AprilTag library. You can use it to detect AprilTag markers in an image and obtain information about these markers, such as their position coordinates and IDs.

120
121
# 检测apriltag(detect apriltag)
detector = apriltag.Detector(searchpath=apriltag._get_demo_searchpath())
  • Color space conversion

Convert the BGR image to GRAY image.

123
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  • Detect tag

Use the created detector object (i.e., AprilTag detector) to detect AprilTag markers in the grayscale image “gray”.

124
detections = detector.detect(gray, return_image=False)
  • Get tag information

Retrieve the tag ID corner information, use the cv2.drawContours function to draw the tag on the image, and obtain the tag ID and tag class.

127
128
129
130
131
132
133
134
135
136
137
138
for detection in detections:
    corners = np.int0(detection.corners)  # 获取四个角点(get four corner points)
    cv2.drawContours(img, [np.array(corners, int)], -1, (0, 255, 255), 2)

    tag_family = str(detection.tag_family, encoding='utf-8')  # 获取tag_family(get tag_family)
    tag_id = int(detection.tag_id)  # 获取tag_id(get tag_id)

    object_center_x, object_center_y = int(detection.center[0]), int(detection.center[1])  # 中心点(center point)
    
    object_angle = int(math.degrees(math.atan2(corners[0][1] - corners[1][1], corners[0][0] - corners[1][0])))  # 计算旋转角(calculate rotation angle)
    
    return tag_family, tag_id
  • Print tag information

Use the cv2.putText function to print the detected ID information.

155
156
cv2.putText(img, "tag_id: " + str(tag_id), (10, img.shape[0] - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2)
cv2.putText(img, "tag_family: " + tag_family, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2)
  • Tag recognition

According to the detected tag ID, use the agc.run_action_group function to invoke the corresponding action group file and control the robot’s movement.

 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
if tag_id == 1:#标签ID为1时(when tag ID is 1)
    AGC.runActionGroup('bow')#鞠躬(bow)
    tag_id = None
    time.sleep(1)
    action_finish = True
elif tag_id == 2:
    AGC.runActionGroup('stepping')#原地踏步(march in place)
    tag_id = None
    time.sleep(1)
    action_finish = True
elif tag_id == 3:
    AGC.runActionGroup('twist')#扭腰(twist your waist)
    tag_id = None
    time.sleep(1)
    action_finish = True

5.8.5 Function Extension

  • Modify the Action Corresponding to the Tag

Program default setting is that TonyPi will bow when the tag ID i is detected. We can revise the feedback action to wave hand for example.

(1) Enter command cd TonyPi/Functions/ to the directory where the game program is located.

cd TonyPi/Functions/

(2) Enter the following command to go into the game program through vi editor.

vim ApriltagDetect.py

(3) Find code AGC.runActionGroup(bow).

Note

After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.

(4) Based on the description of the “TonyPi Action Group List Instruction” located in the path “/home/pi/TonyPi/ActionGroups”, it is known that “bow” corresponds to bowing.

(5) Press “i” to enter the editing mode, then modify the ('bow') in AGC.runActionGroup('bow') to AGC.runActionGroup('wave')

(6) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.

  • Modify or Add the Tag Recognition

The tag data is located in the “ApirlTag Tag Collection” folder under the directory of this section. (The directory needs to be unzipped first)

①You don’t need to download materials online, please go to the directory of this section to find “ApirlTag Tag Collection” for the provided tags. (200 tags in total)

②There is no absolute size requirement for the tag size if you want to print your tags. It is not recommended to be too large or too small for the performance of recognition. (The tag will be circled when recognized.)

③The background next to the tag will be better to keep in white. The dark background may affect the recognition.

In the following sample, we will add the ID4 as new tag. When the tag is recognized, TonyPi will run the “Cheering” action group.

(1) Take the reference of 5.8.5 Function Extension -> Modify the Action Corresponding to the Tag, enter the catalog and open the program file.

(2) Next, you need to copy the code inside the “elif” branch. Here, we can copy the code shown in the image. Move the mouse cursor to the corresponding “elif” line, then type “5yy” on the keyboard (to copy 5 lines). You will see a prompt “5 lines yanked” at the bottom, indicating successful copying.

(3) Then paste these 5 lines of code, and move the mouse cursor to the position shown in the figure below:

(4) Enter “p” on the keyboard to paste the previously copied 5 lines of code below:

(5) Modify the copy code. Enter “i” to the editing mode and revise “tag_id” to “4”, and the action in the AGC.runActionGroup to “chest”.

The built-in action groups can be found in “/home/pi/TonyPi/ActionGroups”.

(6) The modification is completed now. Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.

(7) Take the ID4 tag in folder “ApirlTag Tag Collection” and print it directly.

Check the project outcome according to the commands in previous learning.

5.9 Face Recognition

5.9.1 Program Description

When a human face is detected, the onboard buzzer emits a beeping sound, and the face is highlighted in the video feed.

Facial recognition is one of the most widely used applications in artificial intelligence, commonly seen in scenarios such as smart door locks and smartphone face unlocking.

In this lesson, we use a pre-trained facial recognition model. The image is first resized to improve detection efficiency. Once a face is detected, its coordinates are mapped back to the original image size. The system identifies the largest face, draws a bounding box around it, and triggers the buzzer to sound an alert.

5.9.2 Start and Close the Game

Note

The input of commands must strictly distinguish between uppercase and lowercase letters.

(1) Power on the device and, following the instructions in 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection, use the VNC remote connection tool to connect.

(2) Click the icon in the top left corner of the system desktop to open the LX terminal.

(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions/

(4) Enter the command and press Enter to start the program:

python3 Face_Detect.py

(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.

5.9.3 Program Outcome

Note

For optimal performance, please avoid using this activity under strong lighting conditions, such as direct sunlight or close proximity to incandescent lights, as intense light can affect face recognition accuracy. It is recommended to conduct this activity indoors, with the face positioned within a range of 50 cm to 1 meter from the camera.

After the feature is activated, the buzzer will emit a beeping sound when a face is detected, and the detected face will be outlined in the returned video feed.

5.9.4 Program Brief Analysis

Source Code

The source code of the program is saved in: /home/pi/TonyPi/Functions/Face_Detect.py

  • Program Logic

First, the camera captures image data, which is then processed by converting the color space to facilitate face detection.
MediaPipe’s face detection model is then applied to the image to identify faces. Once detection results are obtained, the system triggers a predefined action group in response.

  • Importing Parameter Modules

During the initialization step, the required libraries are first imported to enable subsequent program functionality. Specifically:

(1) “os” and “sys” are used for operating system functions and system parameter access.

(2) “cv2” (OpenCV library) is used for image processing.

(3) “time” is used for time control and delays.

(4) hiwonder.Misc and hiwonder.Board are hardware-specific libraries used to control components such as the buzzer.

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import sys
import cv2
import math
import time
import threading
import numpy as np
import mediapipe as mp
import hiwonder.ros_robot_controller_sdk as rrc
import hiwonder.yaml_handle as yaml_handle
import hiwonder.Camera as Camera

# 人脸检测(face detection)

# 初始化机器人底层驱动(initialize robot underlying driver)
board = rrc.Board()
Module Import Purpose
import sys Imports the Python sys module, which provides access to system-specific parameters and functions.
import cv2 Imports the OpenCV library, which is used for image processing and computer vision tasks.
import time Imports the Python time module, which provides functions for handling time-related tasks, such as delays.
import HiwonderSDK.Misc as Misc Imports the Misc module from the Hiwonder SDK for handling recognized rectangular data.
import threading Provides support for running tasks in multiple threads concurrently
import yaml_handle Contains functions or tools for handling YAML format files
from ArmIK.Transform import * Imports functions for robotic arm posture transformations
from ArmIK.ArmMoveIK import * Provides functions for inverse kinematics solving and control for robotic arm movement
import HiwonderSDK.Board as Board Imports the Board module from the Hiwonder SDK, which is used to control sensors and execute related actions
  • Setting Initial State

20
21
22
23
# 导入人脸识别模块(import human face recognition module)
face = mp.solutions.face_detection
# 自定义人脸识别方法,最小的人脸检测置信度0.5(custom human face recognition method, the minimum face detection confidence is 0.5)
face_detection = face.FaceDetection(min_detection_confidence=0.5)
  • Color Space Conversion

The BGR image is converted to an RGB image.

54
55
image_rgb = cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB) # 将BGR图像转为RGB图像(convert the BGR image to RGB image)
results = face_detection.process(image_rgb) # 将每一帧图像传给人脸识别模块(pass each frame of the image to the face recognition module)
  • Using Mediapipe Face Model for Recognition

The system performs face detection and draws a rectangle around the detected face.

56
57
58
59
60
61
62
63
if results.detections:   # 如果检测不到人脸那就返回None(if no face is detected, return None)
    for index, detection in enumerate(results.detections): # 返回人脸索引index(第几张脸),和关键点的坐标信息(return the face index (which face) and the coordinate information of the keypoints)
        bboxC = detection.location_data.relative_bounding_box # 设置一个边界框,接收所有的框的xywh及关键点信息(set up a bounding box to receive the xywh and keypoint information for all boxes)
        
         # 将边界框的坐标点,宽,高从比例坐标转换成像素坐标(convert the coordinates, width, and height of the bounding box from relative coordinates to pixel coordinates)
        bbox = (int(bboxC.xmin * img_w), int(bboxC.ymin * img_h),
               int(bboxC.width * img_w), int(bboxC.height * img_h))
        cv2.rectangle(img, bbox, (0,255,0), 2)  # 在每一帧图像上绘制矩形框(draw a rectangle on each frame of the image)
  • Face Recognition

If a face is detected, the buzzer will be activated to sound an alert. The function board.set_buzzer(1900, 0.3, 0.7, 1) is used to trigger the buzzer.

27
28
29
30
31
32
33
34
35
36
def buzzer():
    global di_once
    global detect_people
    
    while True:
        if detect_people and di_once:
            board.set_buzzer(1900, 0.3, 0.7, 1) # 以1900Hz的频率,持续响0.3秒,关闭0.7秒,重复1次(at a frequency of 1900Hz, sound for 0.1 seconds, then pause for 0.9 seconds, repeat once)
            di_once = False
        else:
            time.sleep(0.01)

5.10 Face Recognition

5.10.1 Program Logic

The robot recognizes a human face, and after recognition, it performs a “greeting” action.

In artificial intelligence, one of the most widespread applications is image recognition, with facial recognition being the hottest application in image recognition. It is commonly used in scenarios like door locks and phone facial unlocking.

In this section, the trained face model is first zoomed to detect the face. Then the recognized face coordinates are converted to the coordinates before scaling. Judge whether it is the largest face, and frame the recognized face.

Then set the servo to rotate left and right to obtain the face, and call the action group to let the robot perform the recognized feedback.

5.10.2 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) Input the following command and press Enter to locate to the directory where the program is stored.

cd TonyPi/Functions

(4) Input the command below, then press Enter to start the game.

python3 FaceDetect.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.10.3 Project Outcome

Note

Please do not try the Facial Recognition game under strong light, such as sunlight. Strong light will affect the recognition performance, so it is recommended to play this game indoors. It’s better to set the distance between face and camera with 50-100cm.

Start the facial recognition function, TonyPi will rotate its head to detect face. It will stop when the face is recognized, and run the greeting actions.

5.10.4 Programming Instruction

The source code of this program is locate in /home/pi/TonyPi/Functions/FaceDetect.py.

(1) Import Parameter Module

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys
import os
import cv2
import math
import time
import threading
import numpy as np

import mediapipe as mp
import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle

import sys :The Python “sys” module has been imported for accessing system-related functions and variables.

import os:The Python “os” module has been imported, providing functions and methods for interacting with the operating system.

import cv2:The OpenCV library has been imported for image processing and computer vision-related functionalities

import time:The Python “time” module has been imported for time-related functionalities, such as delay operations.

import math:The “math” module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants.

import threading:Provides an environment for running multiple threads concurrently.

import np:The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations.

import mediapipe as mp:Import mediapipe library, which is used to detect human face

import sensor.camera as camera:Import camera library

from common import misc:The “Misc” module has been imported for handling recognized rectangular data.

import common.ros_robot_controller_sdk as rrc:The robot’s underlying control library has been imported for controlling servos, motors, RGB lights, and other hardware.

import common.yaml_handle:Contains functionalities or tools related to processing YAML format files.

from common.controller import Controller:Import action group execution library.

(2) Set initial state

Set initial state, including the initial position of servo, human face recognition module, Minimum Face Confidence, etc.

33
34
35
36
# 导入人脸识别模块(import human face recognition module)
face = mp.solutions.face_detection
# 自定义人脸识别方法,最小的人脸检测置信度0.5(custom human face recognition method, the minimum face detection confidence is 0.5)
face_detection = face.FaceDetection(min_detection_confidence=0.7)
50
51
52
53
54
55
servo2_pulse = servo_data['servo2']

# 初始化机器人舵机初始位置(initialize the servo initialization position of robot)
def initMove():
    ctl.set_pwm_servo_pulse(1, 1800, 500)
    ctl.set_pwm_servo_pulse(2, servo2_pulse, 500)

(3) Color space conversion

Convert the BGR image to LAB image.

138
    image_rgb = cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB) # 将BGR图像转为RGB图像(convert BGR image to RGB image)

(4) Use mediapipe human face model recognition

Perform face detection and draw rectangles around the detected faces. Then, based on whether the position of the face center is in the center of the frame, if so, set “start_greet” to True to execute the action group.

139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
    results = face_detection.process(image_rgb) # 将每一帧图像传给人脸识别模块(pass each frame of the image to the face recognition module)
    if results.detections:   # 如果检测不到人脸那就返回None
        for index, detection in enumerate(results.detections): # 返回人脸索引index(第几张脸),和关键点的坐标信息(return the face index (which face) and the coordinate information of the keypoints)
            bboxC = detection.location_data.relative_bounding_box # 设置一个边界框,接收所有的框的xywh及关键点信息(set up a bounding box to receive the xywh and keypoint information for all boxes)
            
            # 将边界框的坐标点,宽,高从比例坐标转换成像素坐标(convert the coordinates, width, and height of the bounding box from relative coordinates to pixel coordinates)
            bbox = (int(bboxC.xmin * img_w), int(bboxC.ymin * img_h),  
                   int(bboxC.width * img_w), int(bboxC.height * img_h))
            cv2.rectangle(img, bbox, (0,255,0), 2)  # 在每一帧图像上绘制矩形框(draw a rectangle on each frame of the image)
            x, y, w, h = bbox  # 获取识别框的信息,xy为左上角坐标点(get the information of the recognition box, where xy is the coordinate point of the upper left corner)
            center_x =  int(x + (w/2))
           
            if abs(center_x - img_w/2) < img_w/4:
                if action_finish:
                    start_greet = True

(5) Face detection

If a face is detected, use the “agc.run_action_group” function to invoke the “wave” action group.

105
106
107
108
109
110
111
112
    while True:
        if __isRunning:
            if start_greet:
                start_greet = False
                action_finish = False
                AGC.runActionGroup('wave') # 识别到人脸时执行的动作(the action to be performed when a face is recognized)
                action_finish = True
                time.sleep(0.5)

If no face is detected, control the pan-tilt servo to rotate left and right to search for a face.

113
114
115
116
117
118
119
120
121
            else:
                if servo2_pulse > 2000 or servo2_pulse < 1000:
                    d_pulse = -d_pulse
            
                servo2_pulse += d_pulse 
                ctl.set_pwm_servo_pulse(2, servo2_pulse, 50)
                time.sleep(0.05)
        else:
            time.sleep(0.01)

5.10.5 Function Extension

  • Modify Feedback Action

Note

The built-in file is located in “/home/pi/TonyPi/ActionGroups”.

Program default setting is that TonyPi will execute the greeting action when detect the face. The feedback action can be revised to others such as bowing.

(1) Enter the following command to the directory where the game program is located.

cd TonyPi/Functions/

(2) Enter the command below to go into the game program through vi editor.

vim FaceDetect.py

(3) Find code AGC.runActionGroup('wave').

wave” in the above image is the name of greeting action. If we want to revise the action to bowing, enter “bow” instead of “wave” in the “Action group instruction” in the path /home/pi/TonyPi/ActionGroups.

Note

The action name can be found in the “Action group instruction”.

(4) Press “i” to enter the editing mode, then modify “wave” to “bow”.

(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.

5.11 Gesture Control

5.11.1 Game Overview

This program leverages MediaPipe’s hand detection model to recognize palm joints. Upon detecting a specific hand gesture, the robot locks onto the fingertip within the camera frame, initiates tracking, and visualizes the fingertip’s movement by drawing its trajectory.

The process begins by invoking the MediaPipe hand detection model and capturing real-time images from the camera. The input image is then flipped and preprocessed to extract hand landmarks. By analyzing the connections between key points, the system calculates finger angles to accurately identify gestures.

Once a target gesture is recognized, the robot starts tracking the fingertip and overlays its movement path directly onto the video feed for visual feedback.

5.11.2 Start and Close the Game

Note

The input of commands must strictly distinguish between uppercase and lowercase letters.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions/

(4) Enter the command and press Enter to start the program:

python3 gesture_control.py

(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.

5.11.3 Program Outcome

After starting the program, place your hand within the camera’s field of view. Once the hand is detected, the system will mark the key points of the hand on the live video feed.

The camera displays the real-time view. When the gesture “1” (index finger pointing) is recognized, the system begins tracking the fingertip and drawing its movement path on the screen.

For example, if the fingertip moves to the right, a path will be drawn in that direction. When the system detects the open palm gesture “5”, the buzzer will beep once, the robot will strafe to the right, and the previously drawn trajectory will be cleared from the screen.

Similarly:

  • Moving the fingertip upward on the screen prompts the robot to move forward.

  • Moving it downward prompts the robot to move backward.

  • In each case, the robot responds to the drawn trajectory by executing the corresponding movement.

5.11.4 Brief Program Analysis

The program file corresponding to this lesson is located at: /home/pi/TonyPi/Functions/gesture_control.py

This feature captures images through the camera and performs preprocessing—specifically, converting the image to a different color space to enhance recognition. After preprocessing, the program extracts gesture feature points by identifying key landmarks on the hand. It then uses logical analysis (based on angles) to determine different hand gestures. Finally, the system draws the recognized gesture trajectory on the live video feed.

Note

Before making any modifications, be sure to back up the original factory program. Do not edit the source file directly, as improper parameter changes may result in serious malfunctions that could render the robot unusable.

  • Definition of the Drawing Object

155
drawing = mp.solutions.drawing_utils

drawing is a tool used to draw joint feature points. After detecting the key hand landmarks, it is used to visualize and connect those points accordingly.

mp refers to the MediaPipe recognition module, which is used to extract hand features. Its drawing_utils module serves as the toolkit for rendering the detected landmarks.

  • Definition of the Hand Feature Detector (hand_detector)

157
158
159
160
161
162
hand_detector = mp.solutions.hands.Hands(
     static_image_mode=False,
    max_num_hands=1,
    min_tracking_confidence=0.05,
    min_detection_confidence=0.6
)

hand_detector is the hand feature detection tool. Here, mp refers to the MediaPipe recognition module, and mp.solutions.hands.Hands is the specific component used for extracting hand landmarks.

Several key parameters within this module require attention:

(1) static_image_mode: When set to False, the system dynamically tracks hands based on the value of max_num_hands. It is recommended to keep this set to False by default.

(2) max_num_hands: Specifies the maximum number of hands to detect. The default is 1.

(3) min_tracking_confidence:A threshold for hand tracking confidence. If the recognition confidence during movement exceeds this value, the system will update the hand’s position based on the current image. Otherwise, it will use the previously recognized position. If tracking is unstable, you may adjust this value by ±0.3, but it must not be lower than 0.

(4) min_detection_confidence: The minimum confidence threshold (between 0 and 1) for the hand detection model. If the detection exceeds this value, the content in the image is recognized as a hand. If detection is unreliable, consider adjusting this value by ±0.1, but it must not be lower than 0.

  • Gesture Recognition Processing Function

Once the basic tool parameters are defined, the program proceeds to the logic recognition phase. The following points outline the core parts of the code—from initial image preprocessing to the final drawing of the fingertip trajectory.

(1) Storing Key Landmark Detection Results

277
            results = hand_detector.process(image)

The process function within self.hand_detector is used to extract the hand’s key landmarks. The results (i.e., the positions of these key points in the image) are stored in the results variable for further logical processing.

(2) Drawing Parameter Configuration

282
283
284
285
                    drawing.draw_landmarks(
                        display_image,
                        hand_landmarks,
                        mp.solutions.hands.HAND_CONNECTIONS)

After detecting the key landmarks, the drawing tool must be configured to define how the landmarks and their connections are rendered.

① bgr_image: the input image.

② hand_landmarks: the detected key points of the hand.

③ HAND_CONNECTIONS: the connection scheme based on standard landmark indices to draw the lines between points.

(3) Finger Type Logical Classification (Thumb, Index Finger, etc.)

53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def hand_angle(landmarks):
    """
    计算各个手指的弯曲角度
    :param landmarks: 手部关键点
    :return: 各个手指的角度
    """
    angle_list = []
    # thumb 大拇指
    angle_ = vector_2d_angle(landmarks[3] - landmarks[4], landmarks[0] - landmarks[2])
    angle_list.append(angle_)
    # index 食指
    angle_ = vector_2d_angle(landmarks[0] - landmarks[6], landmarks[7] - landmarks[8])
    angle_list.append(angle_)
    # middle 中指
    angle_ = vector_2d_angle(landmarks[0] - landmarks[10], landmarks[11] - landmarks[12])
    angle_list.append(angle_)
    # ring 无名指
    angle_ = vector_2d_angle(landmarks[0] - landmarks[14], landmarks[15] - landmarks[16])
    angle_list.append(angle_)
    # pink 小拇指
    angle_ = vector_2d_angle(landmarks[0] - landmarks[18], landmarks[19] - landmarks[20])
    angle_list.append(angle_)
    angle_list = [abs(a) for a in angle_list]
    return angle_list

Once the landmarks have been stored in the results variable, logical processing is performed to classify the finger types. This is achieved by analyzing the angles between specific key points to determine which finger is which (e.g., thumb, index finger).

The hand_angle function receives the landmark set (landmarks(results)) and uses the vector_2d_angle function to calculate the angles between relevant points. Each element in the landmarks set corresponds to a specific finger joint, as illustrated in the following diagram.

The vector_2d_angle function is used to calculate the angle between joints. Specifically, the points landmarks[3], landmarks[4], landmarks[0], and landmarks[2] correspond to the key landmarks numbered 3, 4, 0, and 2 in the hand feature extraction diagram. By computing the angle formed by these joints at the fingertip, the posture characteristics of the thumb can be determined.

Similarly, the processing logic for the other fingers’ joints follows the same approach.

To ensure accurate recognition, the parameters and basic logic (addition and subtraction of angles) inside the hand_angle function should be kept at their default settings.

(4) Gesture Feature Detection

 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def h_gesture(angle_list):
    """
    通过二维特征确定手指所摆出的手势
    :param angle_list: 各个手指弯曲的角度
    :return : 手势名称字符串
    """
    thr_angle = 65.
    thr_angle_thumb = 53.
    thr_angle_s = 49.
    gesture_str = "none"
    if (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "fist"
    elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "hand_heart"
    elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s):
        gesture_str = "nico-nico-ni"
    elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "hand_heart"
    elif (angle_list[0] > 5) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "one"
    elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "two"
    elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and (
            angle_list[3] < thr_angle_s) and (angle_list[4] > thr_angle):
        gesture_str = "three"
    elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] < thr_angle_s) and (
            angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s):
        gesture_str = "OK"
    elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and (
            angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s):
        gesture_str = "four"
    elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and (
            angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s):
        gesture_str = "five"
    elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s):
        gesture_str = "six"
    else:
        "none"
    return gesture_str

After identifying different finger types and determining their positions in the image, you can implement the function h_gesture to perform logical recognition for various hand gestures.

In the h_gesture function shown above, the parameters thr_angle = 65, thr_angle_thenum = 53, and thr_angle_s = 49 serve as threshold angle values for gesture logic decisions. These values were determined through testing and provide stable recognition results. It is not recommended to change them significantly; if the recognition performance is unsatisfactory, you may adjust these thresholds by ±5 degrees.

The list angle_list[0,1,2,3,4] corresponds to the angles of the five fingers of the hand.

Taking the gesture “one” as an example:

100
101
102
    elif (angle_list[0] > 5) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and (
            angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
        gesture_str = "one"

① angle_list[0] > 5 checks whether the thumb’s joint angle in the image is greater than 5 degrees.

② angle_list[1] < thr_angle_s verifies if the index finger’s angle is less than the threshold thr_angle_s.

③ angle_list[2] < thr_angle checks if the middle finger’s angle is less than thr_angle.

④ The logic for the other two fingers, angle_list[3] and angle_list[4], follows a similar pattern.

When these conditions are met, the current gesture is recognized as “one.” Recognition of other gestures follows similar logical frameworks.

Although the specific logic varies by gesture, the overall structure is consistent. You can refer to the previous section for details on other gesture recognition logics.

(5) Fingertip Feature Detection, Motion Trajectory Drawing, and Trajectory Clearing

The fingertip feature detection process is illustrated in the diagram below:

290
291
292
293
294
295
296
297
298
299
                if state != State.TRACKING:
                    if gesture == "one":  # 检测食指手势, 开始指尖追踪
                        count += 1
                        if count > 5:
                            count = 0
                            state = State.TRACKING
                            points = []
                            points_list = []
                    else:
                        count = 0

The code snippet above shows the logic used when the gesture “one” is detected. The variable self.count tracks the number of consecutive frames the gesture has been maintained. When self.count < 5, it means the gesture must be held for 5 frames before it is confirmed as gesture “one”. (Adjusting this frame count allows control over the duration needed for recognition—typically set around 5 to 7 frames. Too long a duration may negatively impact recognition accuracy.)

Once confirmed, the current gesture state is set to TRACKING, meaning the system can start tracking motion. At this point, self.points (representing the positions of the current two adjacent points) and points_list (a collection of feature points used to draw the trajectory line) are initialized.

The motion trajectory drawing process is illustrated in the following diagram:

301
302
303
304
305
306
307
308
309
310
311
312
313
314
                elif state == State.TRACKING:
                    if gesture != "two":
                        if len(points) > 0:
                            last_point = points[-1]
                            if distance(last_point, index_finger_tip) < 5:
                                count += 1
                            else:
                                count = 0
                                points_list.append([int(index_finger_tip[0]), int(index_finger_tip[1])])
                                points.append(index_finger_tip)
                        else:
                            points_list.append([int(index_finger_tip[0]), int(index_finger_tip[1])])
                            points.append(index_finger_tip)
                    draw_points(display_image, points)

While the gesture is in the sliding (tracking) state, a custom function distance is used to calculate the distance between two points before and after movement. The logical condition distance(last_point, index_finger_tip) < 5 checks whether the pixel distance between these two points is less than 5.

The value 5 represents the pixel threshold for movement between frames. If a faster movement speed and still accurate logic are required, this value can be slightly increased—though it is recommended not to exceed 10. Conversely, lowering the value tightens the movement threshold.

The fingertip’s positional data is stored in pixels, which holds the (x, y) coordinates of the fingertip and is used for drawing the trajectory on the image.

The trajectory clearing process is illustrated in the diagram below:

                if gesture == "five":
                    state = State.NULL
                    if len(points_list) > 10 and not start_move:
                        board.set_buzzer(1900, 0.1, 0.9, 1)
                        target_points = points_list
                        start_move = True
                    points = []
                    points_list = []
                    draw_points(display_image, points)
        except Exception as e:
            print(e)
    return display_image

When the gesture is recognized as “five”, the gesture recognition state is set to NULL (empty), and the current collection of points (the recorded trajectory positions) is cleared.

5.12 Pose Control

5.12.1 Game Overview

This experiment is designed to use MediaPipe’s pose detection model to recognize key human body points (such as hands, arms, shoulders, etc.) and, based on their positions and angles, achieve real-time control of body posture. By detecting the posture of the hands, the robot can respond accordingly, enabling gesture-based human-robot interaction.

The core of the experiment is to capture the user’s posture information in real time using MediaPipe’s pose detection model and map it to the robot’s motion control system.

The process includes:

  • Using MediaPipe’s pose model to obtain the coordinates of key body points.

  • Calculating movements (e.g., raising an arm, bending an elbow) based on changes in key point positions.

  • Mapping the user’s movements to the robot’s arm joint controls, allowing the robot to mimic the user’s actions.

  • Displaying the user’s posture and the robot’s action status in real time on the screen.

5.12.2 Start and Close the Game

Note

The input of commands must strictly distinguish between uppercase and lowercase letters.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions/

(4) Enter the command and press Enter to start the program:

python3 pose_control.py

(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.

5.12.3 Program Outcome

After starting the activity, place your hand within the camera’s field of view. Once the upper body skeleton is detected, key points of the human body will be marked in the returned video feed.

The camera will display the live video and automatically recognize the body’s key points. Based on the key points of the hands, the servos will be driven to mimic the movements of the human arm. When the user opens their arms, the robot will simultaneously open its arms as well, achieving posture-based control.

5.12.4 Brief Program Analysis

The program file corresponding to this lesson is located at /home/pi/TonyPi/Functions/pose_control.py

Note

Always back up the original factory program before making any modifications. Directly editing the source code is strictly prohibited, as improper changes to parameters may result in serious malfunctions that could render the robot inoperable and beyond repair.

(1) Import Function Library

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import os
import cv2
import math
import copy
import threading
import numpy as np
import mediapipe as mp
from hiwonder import fps
from mediapipe import solutions
from mediapipe.tasks import python
import hiwonder.yaml_handle as yaml_handle
from hiwonder.Controller import Controller
from mediapipe.tasks.python import vision
import hiwonder.ros_robot_controller_sdk as rrc
from mediapipe.framework.formats import landmark_pb2
from mediapipe_visual import draw_pose_landmarks_on_image

import os:Imports Python’s os module, which provides functions for interacting with the operating system.

import cv2:Imports the OpenCV library for image processing and computer vision tasks.

import math:Provides access to mathematical functions and constants.

import copy:Enables object copying functionality.

import threading:Provides support for running multiple threads concurrently.

import numpy as np:Imports NumPy, a library for numerical computing, commonly used for handling arrays and matrix operations.

import mediapipe as mp:Imports the MediaPipe library for pose and hand gesture detection.

from hiwonder import fps :Imports the FPS (frames per second) calculator for performance monitoring.

from mediapipe import solutions:Imports pre-defined modules from MediaPipe, such as hand tracking, face detection, and pose estimation.

from mediapipe.tasks import python: Imports the Python API from MediaPipe’s task module.

import hiwonder.yaml_handle as yaml_handle:Provides tools for handling YAML-format configuration files.

from hiwonder.Controller import Controller:Imports the motion control library for managing robot movements.

from mediapipe.tasks.python import vision:Imports the vision module for handling visual processing tasks.

import hiwonder.ros_robot_controller_sdk as rrc:Imports the low-level robot control SDK for controlling servos, motors, RGB lights, and other hardware.

from mediapipe.framework.formats import landmark_pb2:Handles serialization and deserialization of landmark data.

from mediapipe_visual import draw_pose_landmarks_on_image:Used to draw pose landmarks on images, helping to visualize detected key points.

(2) Pose Detection Initialization

57
model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/pose_landmarker_lite.task')

Load a pre-trained MediaPipe Lite pose detection model to detect 33 human body keypoints. model_path: Specifies the path to the model file. base_options: Sets the model file path using MediaPipe’s base options. options: Configures the pose detector, including whether to output a segmentation mask. detector: Instantiates the pose detection object.

(3) Image Preprocessing

① cv2.cvtColor: Converts the image from BGR format to RGB format, as required by MediaPipe.

② cv2.flip: Horizontally flips the image to better match natural viewing habits.

③ mp.Image: Converts the image into a format compatible with MediaPipe.

(3) Posture Detection

72
        detection_result = detector.detect(mp_image)

Use the MediaPipe pose detector to analyze the image and return the detection results.

(4) Drawing Pose Landmarks

79
80
81
82
            # Draw the pose landmarks.
            pose_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
            pose_landmarks_proto.landmark.extend([
              landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in pose_landmarks

① pose_landmarks_list: A list of detected human pose landmarks.

② annotated_image: A copy of the original image used to draw the detection results.

③ Iterate through each detected human pose to visualize the landmarks.

(5) Calculate Joint Angle

107
108
109
110
111
112
113
            # up 0 1000down
            angle1 = vector_2d_angle(np.array(left_p1) - np.array(left_p0), np.array(left_p1) - np.array(left_p2))
            angle2 = vector_2d_angle(np.array(left_p2) - np.array(left_p1), np.array(left_p3) - np.array(left_p2)) 
            # 90 -90
            # 1000 0
            angle3 = vector_2d_angle(np.array(right_p1) - np.array(right_p0), np.array(right_p1) - np.array(right_p2))
            angle4 = vector_2d_angle(np.array(right_p2) - np.array(right_p1), np.array(right_p3) - np.array(right_p2))

Use the vector_2d_angle() function to calculate the angles between human joints (such as shoulders, elbows, knees, etc.). This function computes the angle using the dot product and cross product of 2D vectors. Each angle is determined by applying the cosine and sine rules.

(6) Set Servo Angle Range

123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
                if servo6 > 875:
                    servo6 = 875
                if servo6 < 125:
                    servo6 = 125
                if servo7 > 875:
                    servo7 = 875
                if servo7 < 125:
                    servo7 = 125
                if servo14 > 875:
                    servo14 = 875
                if servo14 < 125:
                    servo14 = 125
                if servo15 > 875:
                    servo15 = 875
                if servo15 < 125:
                    servo15 = 125

(7) Control Servo

Control the servo based on the calculated angle.

148
149
150
151
152
153
                if x1 > 0 and x2 > 0:
                    board.bus_servo_set_position(0.1, [[7, servo7], [6, servo6], [14, servo14], [15, servo15]])
                elif x1 > 0 and x2 < 0:
                    board.bus_servo_set_position(0.05, [[7, servo7], [6, servo6]])
                elif x1 < 0 and x2 > 0:
                    board.bus_servo_set_position(0.05, [[14, servo14], [15, servo15]])

The calculated joint angles are mapped to the servo’s operating range (between 125 and 875). The val_map() function is used to convert the angle values into the valid range for servo control. For each angle, it checks whether the value falls within the allowed limits and clamps it if necessary. The function board.bus_servo_set_position() sets the servo position accordingly to control the movement.

(8) Drawing MediaPipe Default Pose Landmarks

Use MediaPipe’s default style to draw pose landmarks.

155
156
157
158
159
160
                solutions.drawing_utils.draw_landmarks(
                  annotated_image,
                  pose_landmarks_proto,
                  solutions.pose.POSE_CONNECTIONS,
                  solutions.drawing_styles.get_default_pose_landmarks_style())
                last_angle = [servo6, servo7, servo14, servo15]

The function solutions.drawing_utils.draw_landmarks() is used to render the human pose keypoints and their connecting lines on the image. The pose_landmarks_proto stores the coordinates of each landmark, which are then drawn onto the output image.

5.13 Intelligent Transport

Note

This section is only applicable to users who have purchased the advanced version. The demonstration effect can be viewed in the folder for this section.

5.13.1 Program Logic

The robot will sequentially transport sponge blocks on the map to the corresponding AprilTag label locations until all items are delivered.

This lesson focuses on how the robot accomplishes the task of item transportation, which is divided into two main stages: the recognition stage and the transportation stage.

Recognition Stage: Using coordination between the robot base and the pan-tilt mechanism, the robot “searches” for recognizable objects on the map. When a recognizable color appears within the visual range, the robot begins processing color recognition. The image is first converted to the Lab color space and then binarized. After applying dilation and erosion operations, contours containing only the program-defined colors are obtained. By bounding these color contours, the robot achieves object color recognition. Transportation Stage: Once recognition is complete, the robot moves into transportation. Based on the processed image feedback, when multiple objects are within the field of view, the robot assesses their relative distances and prioritizes transporting the closest object. The robot approaches the selected object and, upon reaching a set range, lifts it up to its head level. According to the object’s color, the robot matches it with the corresponding AprilTag label, which determines the final delivery location. Then, by controlling the pan-tilt and the robot base, it scans the map for tags. When a tag is detected, the robot takes different actions depending on whether it is the target tag.

  • If the scanned tag is the target, the robot transports the object directly to that point and places it down.

  • If the scanned tag is not the target, the robot uses the tag’s position to infer the location of the target tag, then reorients itself toward the target. Once the target tag is scanned, the robot transports the object there and releases it.

5.13.2 Getting Ready

(1) The function of this section should be operated on the provided map. The right side is the items placement zone and the left side is the receiving space.

(2) Place the map on the smooth floor. Place the TonyPi and color blocks in the placement zone.

5.13.3 Operation Steps

Note

Pay attention to the text format in the input of instructions.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions/

(4) Input the following command , then press Enter to start the game.

python3 Transport.py

(5) If you want to exit the game programming, press “Ctrl+C” in the LX terminal interface. If the exit fails, please try it few more times.

5.13.4 Project Outcome

Note

It is recommended to place the map on a flat and open surface for optimal performance.

Place the robot and sponge blocks of red, green, and blue colors randomly within the placement area of the map. After starting the intelligent transportation gameplay, the robot will sequentially transport sponge blocks to the corresponding AprilTag markers based on their proximity until all three blocks are transported.

5.13.5 Comparison Between Voice Transport and Intelligent Transport

Voice transport Intelligent transport
control methods After starting the command-line game
Voice control Auto work
Application scenarios Quiet Environment (Voice commands effective within a distance of less than 30cm) Noisy Environment (No distance requirement)
Work mode Single Transport Continuous Transport

5.13.6 Program Parameter Instruction

The source code of this program is located in: /home/pi/TonyPi/Functions/Transport.py.

  • Import Parameter Module

(1) import sys:Imports Python’s sys module, used for accessing system-related functions and variables

(2) import os:Imports Python’s os module, providing functions and methods to interact with the OS

(3) import cv2:Imports the OpenCV library for image processing and computer vision-related functions

(4) import time:Imports Python’s time module for time-related functions, such as delays

(5) import math:Provides low-level access to mathematical operations, including many common math functions and constants

(6) import threading:Provides a multi-threading runtime environment

(7) import np:Imports the NumPy library, an open-source Python numerical computation extension for array and matrix operations

(8) import hiwonder.TTS as TTS:Imports the speech recognition library

(9) import hiwonder.Camera as Camera:Imports the camera library

(10) from hiwonder.Misc import Misc:Imports the Misc module used to process recognized rectangular data (11) import hiwonder.ros_robot_controller_sdk as rrc:Imports the robot’s low-level control library, used to control servos, motors, RGB lights, and other hardware

(12) from hiwonder.controller import Controller:Imports the motion control library

(13) import hiwonder.ActionGroupControl as AGC:Imports the action group execution library

(14) import common.yaml_handle:Includes functions or tools for processing YAML format files

  • Transport color and preset position parameters

In this game, set up objects of three colors: red, green, and blue, and transport them to their corresponding tag positions, as shown in the pictured:

  • Detect transported object

(1) detect adjustment

At the beginning, the robot adjusts its left and right direction to find the objects to be transported. The specific settings are as shown in the following image:

(2) To detect the objects for transportation based on their color, the following code is used.

545
	    color, color_center_x, color_center_y, color_angle = colorDetect(img)  # 颜色检测,返回颜色,中心坐标,角度(color detection, return color, center coordinates, angle)

The main process involved in detecting object colors is as follows:

① Before converting the image to the LAB color space, noise reduction processing is required. The GaussianBlur() function is used for Gaussian filtering as pictured:

226
227
	frame_resize = cv2.resize(img, size, interpolation=cv2.INTER_NEAREST)
    frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3)   

The first parameter “frame_resize” is inputting image.

The second parameter “(3, 3)” is the size of the Gaussian kernel. A larger kernel size typically results in a greater degree of filtering, making the output image more blurry, and it also increases computational complexity.

The third parameter “3” is the standard deviation of the Gaussian function along the X direction. In the Gaussian filter, it is used to control the variation near its mean. If this value is increased, the allowable range of variation around the mean is also increased; if decreased, the allowable range of variation around the mean is reduced.

②By using the inRange function to perform binaryzation on the input image as pictured:

234
235
236
237
238
239
240
	            frame_mask = cv2.inRange(frame_lab,
                                     (lab_data[i]['min'][0],
                                      lab_data[i]['min'][1],
                                      lab_data[i]['min'][2]),
                                     (lab_data[i]['max'][0],
                                      lab_data[i]['max'][1],
                                      lab_data[i]['max'][2]))  #对原图像和掩模进行位运算(perform bitwise operation to original image and mask)

③ To reduce interference and make the image smoother, it is necessary to perform erosion and dilation operations on the image as pictured:

241
242
	        eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)))  #腐蚀(corrosion)
            dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation)

In the processing, the getStructuringElement function is used to generate structuring elements of different shapes.

The first parameter cv2.MORPH_RECT is the shape of the kernel, which is a rectangle in this case.

The second parameter (3, 3) is the size of the rectangle, which is 3x3 in this case.

④ Find out the largest contour of the object as pictured:

To avoid interference, the if area_max > 500 instruction is used to ensure that only contours with an area greater than 500 are considered valid for the largest area.

⑤ When the robot detects colored objects, use the cv2.drawContours() function to draw the contours of the colored objects as pictured:

The first parameter img is the input image.

The second parameter [box] is the contour itself, represented as a list in Python.

The third parameter “-1” is the index of the contour, where the numerical value represents drawing all contours within the list.

The fourth parameter (0, 255, 255) is the contour color, with the order being B, G, R, where (0, 255, 255) represents yellow in this case.

The fifth parameter “2” is the contour width. If set to “-1”, it means to fill the contour with the specified color.

⑥ After the robot detects the colored object, use the cv2.circle() function to draw the center point of the colored object on the feedback screen as pictured:

261
	                cv2.circle(img, (center_x_, center_y_), 5, (0, 255, 255), -1)#画出中心点(draw center point)

The first parameter img is the input image, which is the image of the detected colored object in this case.

The second parameter (centerX, centerY) is the coordinates of the center point of the circle to be drawn (determined based on the detected object).

The third parameter “5” is the radius of the circle to be drawn.

The fourth parameter (0, 255, 255) is the color of the circle to be drawn, with the order being B, G, R, and in this case, it represents yellow.

The fifth parameter “-1” indicates that the circle should be filled with the color specified in parameter 4. If it is a number, it represents the line width of the circle to be drawn.

  • Start transporting

After detecting a colored object, the robot starts transporting the object, which can be divided into several steps: approaching the object, picking up the object, finding the transportation location, transporting the object, and putting down the object.

(1) approach the object

Before starting the transport, first control the robot to gradually approach the object to be transported as pictured:

(2) pick up the object

After approaching the object, control the robot to pick up the object to be transported as pictured:

(3) find the transportation location

Before transporting the object, find the placement position for the colored object by detecting and recognizing the tag as pictured:

551
	        tag_data = apriltagDetect(img) # apriltag检测(apriltag detection)

The main control parameters involved in the process are as follows:

① After obtaining the information of the four corner points of the tag code, use the cv2.drawContours() function to draw the contour of the tag as pictured:

② After the robot detects the tag, use the cv2.circle() function to draw the center point of the tag on the feedback screen as pictured:

288
289
	        object_center_x, object_center_y = int(detection.center[0]), int(detection.center[1])  # 中心点(center point)
            cv2.circle(frame, (object_center_x, object_center_y), 5, (0, 255, 255), -1)

(4) transport object

After picking up the object, transport the object to the corresponding position as pictured:

After picking up the object, set “step = 1”, then control the robot to adjust its left and right position to face the tag position as pictured:

390
391
392
393
394
395
396
397
398
399
400
401
402
403
	            elif step == 1:  # 左右调整,保持在正中(adjust left or right to maintain in the center)
                    x_dis = servo_data['servo2']
                    y_dis = servo_data['servo1']                   
                    turn = ''
                    haved_find_tag = False
                    
                    if (object_center_x - CENTER_X) > 170 and object_center_y > 330:
                        AGC.runActionGroup(back, lock_servos=lock_servos)   
                    elif object_center_x - CENTER_X > 80:  # 不在中心,根据方向让机器人转向一步(not in the center, turn the robot one step according to the direction)
                        AGC.runActionGroup(turn_right, lock_servos=lock_servos)
                    elif object_center_x - CENTER_X < -80:
                        AGC.runActionGroup(turn_left, lock_servos=lock_servos)                        
                    elif 0 < object_center_y <= 250:
                        AGC.runActionGroup(go_forward, lock_servos=lock_servos)

Then, gradually set “step = 2”, “step = 3”, “step = 4” to control the robot to transport the object to the tag position as pictured:

405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
	                    step = 2
                elif step == 2:  # 接近物体(approach the object)
                    if 330 < object_center_y:
                        AGC.runActionGroup(back, lock_servos=lock_servos)
                    if find_box:
                        if object_center_x - CENTER_X > 150:  
                            AGC.runActionGroup(right_move_large, lock_servos=lock_servos)
                        elif object_center_x - CENTER_X < -150:
                            AGC.runActionGroup(left_move_large, lock_servos=lock_servos)                        
                        elif -10 > object_angle > -45:# 不在中心,根据方向让机器人转向一步(not in the center, turn the robot one step according to the direction)
                            AGC.runActionGroup(turn_left, lock_servos=lock_servos)
                        elif -80 < object_angle <= -45:
                            AGC.runActionGroup(turn_right, lock_servos=lock_servos)
                        elif object_center_x - CENTER_X > 40:  
                            AGC.runActionGroup(right_move_large, lock_servos=lock_servos)
                        elif object_center_x - CENTER_X < -40:
                            AGC.runActionGroup(left_move_large, lock_servos=lock_servos)
454
455
456
457
458
459
460
461
462
463
464
465
466
467
	                    step = 4 
                elif step == 4:  #靠近物体(approach the object)
                    if 280 < object_center_y <= 340:
                        AGC.runActionGroup('go_forward_one_step', lock_servos=lock_servos)
                        time.sleep(0.2)
                    elif 0 <= object_center_y <= 280:
                        AGC.runActionGroup(go_forward, lock_servos=lock_servos)
                    else:
                        if object_center_y >= 370:
                            go_step = 2
                        else:
                            go_step = 3
                        if abs(object_center_x - CENTER_X) <= 40:
                            stop_detect = True

During the transportation process, if the target tag is not detected, use other tags to determine the relative position as pictured:

303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# 通过其他apriltag判断目标apriltag位置(determine the position of the target apriltag using Other apriltags)
# apriltag摆放位置:红(tag36h11_1),绿(tag36h11_2),蓝(tag36h11_3)(apriltag placement: red(tag36h_11_1), green(tag36h11_2), blue(tag36h11_3))
def getTurn(tag_id, tag_data):
    tag_1 = tag_data[0]
    tag_2 = tag_data[1]
    tag_3 = tag_data[2]

    if tag_id == 1:  # 目标apriltag为1(target apriltag is 1)
        if tag_2[0] == -1:  # 没有检测到apriltag 2(apriltag 2 is not detected)
            if tag_3[0] != -1:  # 检测到apriltag 3, 则apriltag 1在apriltag 3左边,所以左转(detected apriltag 3, therefore apriltag 1 is to the left of apriltag 3, so turn left)
                return 'left'
        else:  # 检测到apriltag 2,则则apriltag 1在apriltag 2左边,所以左转(detected apriltag 2, therefore apriltag 1 is to the left of apriltag 2, so turn left)
            return 'left'
    elif tag_id == 2:
        if tag_1[0] == -1:
            if tag_3[0] != -1:
                return 'left'
        else:
            return 'right'
    elif tag_id == 3:
        if tag_1[0] == -1:
            if tag_2[0] != -1:
                return 'right'
        else:
            return 'right'

    return 'None'
570
571
572
573
	        if turn == 'None':
                object_center_x, object_center_y, object_angle = -1, -1, 0
            else:  # 完全没有检测到apriltag(if no AprilTag is detected at all)
                object_center_x, object_center_y, object_angle = -3, -1, 0

(5) put down object

After completing the transportation, put down the object as pictured:

570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
	            elif step == 5:  # 拿起或者放下物体(pick up or put down the object)
                    if find_box:
                        AGC.runActionGroup('go_forward_one_step', times=2)
                        AGC.runActionGroup('stand', lock_servos=lock_servos)
                        AGC.runActionGroup('move_up')
                        lock_servos = LOCK_SERVOS
                        step = 6    
                    else:
                        AGC.runActionGroup('go_forward_one_step', times=go_step, lock_servos=lock_servos)
                        AGC.runActionGroup('stand', lock_servos=lock_servos)
                        AGC.runActionGroup('put_down')
                        AGC.runActionGroup(back, times=5, with_stand=True)
                        color_list.remove(object_color)
                        if color_list == []:
                            color_list = ['red', 'green', 'blue']
                        lock_servos = ''

5.14 Object Tracking

5.14.1 Program Logic

The robot recognizes colors, and its body can move according to the movement of the target color.

First, program TonyPi to recognize colors with Lab color space. Convert the RGB color space to Lab, image binarization, and then perform operations such as expansion and corrosion to obtain an outline containing only the target color. Use circles to frame the color outline to realize object color recognition.

Next, the traversal algorithm compares all correctly recognized colored objects and selects the object with the largest contour area as the target.

Finally, the servo is called to perform real-time tracking, while the body is driven to perform follow-up actions through action groups, thus completing the object tracking function.

5.14.2 Operation Steps

Note

Instructions must be entered with strict attention to case sensitivity and spacing.

(1) Turn on robot and connect it to Raspberry Pi desktop with VNC. You can refer to 3. Remote Desktop Tool Installation and Connection->3.1 Remote Tool Installation and Connection to learn how to install and connect VNC.

(2) Double-click “Terminator” icon in the Raspberry Pi desktop and open command line.

(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:

cd TonyPi/Functions/

(4) Input the command below, then press Enter to start the game.

python3 Follow.py

(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.

5.14.3 Project Outcome

Note

The default recognized and tracking color is green. If you want to change to blue or red, please refer to “5.14.4 Function Extension -> Modify Default Recognition Color”. Furthermore, when moving the handheld colored sponge blocks, the speed should not be too fast, and it should be within the range of camera recognition.

After the game is started, slowly move the red sponge block by hand or place the block on a movable carrier. The TonyPi robot will move along with the movement of the target color.

5.14.4 Function Extension

  • Modify Default Recognition Color

Black, red and green are the built-in colors in the motion tracking program and red is the default color. In the following steps, we’re going to modify the tracking color as green.

(1) Enter the following command to the directory where the game program is located.

cd TonyPi/Functions/

(2) Enter the command below to go into the game program through vi editor.

vim Follow.py

(3) Find the code object_color = ('red',).

Note

After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.

(4) Press “i” to enter the editing mode, then modify red in _target_color = ('red') to green.

(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.

  • Add Recognized Color

In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:

(1) Open VNC, input command the following command to open Lab color setting document.

vim /home/pi/TonyPi/lab_config.yaml

(2) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.

(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.

(4) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black.

For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the blue part of the left screen becomes white and other colors become black, and then click “Save” button to keep the modified data.

(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again to check the color setting parameters.

vim /home/pi/TonyPi/lab_config.yaml

(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.

5.14.5 Programming Instruction

The source code of this program is locate in /home/pi/TonyPi/Functions/Follow.py.

  • Import Parameter Module

 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import sys
import os
import cv2
import time
import math
import threading
import numpy as np
import pandas as pd

import hiwonder.PID as PID
import hiwonder.Camera as Camera
import hiwonder.Misc as Misc
import hiwonder.ros_robot_controller_sdk as rrc
from hiwonder.Controller import Controller
import hiwonder.ActionGroupControl as AGC
import hiwonder.yaml_handle as yaml_handle
from CameraCalibration.CalibrationConfig import *

(1) import sys:Imports Python’s sys module, used to access system-related functions and variables

(2) import os:Imports Python’s os module, providing functions and methods to interact with the operating system

(3) import cv2:Imports the OpenCV library, used for image processing and computer vision-related functions

(4) import time:Imports Python’s time module, used for time-related functions such as delay operations

(5) import math:The math module provides low-level access to mathematical operations, including many commonly used math functions and constants

(6) import threading:Provides a multi-threading runtime environment

(7) import np:Imports the NumPy library, an open-source numerical computing extension for Python, used for array and matrix operations

(8) import hiwonder.TTS as TTS:Imports the speech recognition library

(9) import hiwonder.Camera as Camera:Imports the camera library

(10) from hiwonder.Misc import Misc:Imports the Misc module, used for processing recognized rectangular data

(11) import hiwonder.ros_robot_controller_sdk as rrc:Imports the robot’s low-level control library, used to control servos, motors, RGB lights, and other hardware

(12) from hiwonder.controller import Controller:Imports the motion control library

(13) import hiwonder.ActionGroupControl as AGC:Imports the action group execution library

(14) import common.yaml_handle:Includes some functions or tools related to processing YAML format files.

  • Color detection parameter

In the object tracking program, the detected object color is red.

289
    __target_color = ('red')

The main detection parameters involved in the detection process are as follows:

(1) Before converting the image to the LAB color space, noise reduction processing is required. The GaussianBlur() function is used for Gaussian filtering as pictured:

The first parameter frame_resize is inputting image.

The second parameter (3, 3) is the size of the Gaussian kernel. A larger kernel size typically results in a greater degree of filtering, making the output image more blurry, and it also increases computational complexity.

The third parameter “3” is the standard deviation of the Gaussian function along the X direction. In the Gaussian filter, it is used to control the variation near its mean. If this value is increased, the allowable range of variation around the mean is also increased; if decreased, the allowable range of variation around the mean is reduced.

(2) By using the inRange function to perform binaryzation on the input image as pictured:

(3) To reduce interference and make the image smoother, it is necessary to perform erosion and dilation operations on the image as pictured:

In the processing, the getStructuringElement function is used to generate structuring elements of different shapes.

The first parameter cv2.MORPH_RECT is the shape of the kernel, which is a rectangle in this case.

The second parameter (3, 3) is the size of the rectangle, which is 3x3 in this case.

(4) Find out the largest contour of the object as pictured:

To avoid interference, the if area_max_contour is not None and area_max > 100 instruction is used to ensure that only contours with an area greater than 100 are considered valid for the largest area.

  • Color recognition parameter

The main control parameters involved in the color recognition process are as follows:

(1) When the robot detects a colored object, use the cv2.drawContours() function to draw the contour of the colored object as pictured:

The first parameter img is inputting image.

The second parameter [box] is the contour itself, represented as a list in Python.

The third parameter -1 is the index of the contour, where the numerical value represents drawing all contours within the list.

The fourth parameter (0, 255, 255) is the contour color, with the order being B, G, R, and in this case, it represents yellow.

The fifth parameter 2 is the contour width. If set to -1, it means to fill the contour with the specified color.

(2) After the robot detects a colored object, use the cv2.circle() function to draw the center point of the colored object on the feedback screen as pictured:

The first parameter img is the input image, which is the image of the detected colored object in this case.

The second parameter (centerX, centerY) is the coordinates of the center point of the circle to be drawn (determined based on the detected object).

The third parameter “5” is the radius of the circle to be drawn.

The fourth parameter (0, 255, 255) is the color of the circle to be drawn, with the order being B, G, R, and in this case, it represents yellow.

The fifth parameter “-1” indicates that the circle should be filled with the color specified in parameter 4. If it is a number, it represents the line width of the circle to be drawn.

  • Perform motion parameter

(1) After detecting a red object, control servo 1 and servo 2 of the robot to move the upper camera with the movement of the red object.

Take code ctl.set_pwm_servo_pulse(1, vertical_servo_position,use_time*1000) as example:

The first parameter “1” represents controlling servo ID 1.

The second parameter vertical_servo_position represents the pulse width of servo ID 1.

The third parameter use_time*1000 represents the movement time of the servo, in milliseconds.

(2) After detecting the red ball, the robot calls the action group file in the “/home/pi/TonyPi/ActionGroups” directory to control the robot to move along with the red object as pictured: