# 7. ROS2+Machine Learning Courses ## 7.1 MediaPipe Man-Robot Interaction ### 7.1.1 MediaPipe Introduction * **MediaPipe Description** MediaPipe is an open-source framework of multi-media machine learning models. Cross-platform MediaPipe can run on mobile devices, workspace and servers, as well as support mobile GPU acceleration. It is also compatible with TensorFlow and TF Lite Inference Engine, and all kinds of TensorFlow and TF Lite models can be applied on it. Besides, MediaPipe supports GPU acceleration of mobile and embedded platform.

* **MediaPipe Pros and Cons** (1) MediaPipe Pros ① MediaPipe supports various platforms and languages, including iOS, Android, C++, Python, JAVAScript, Coral, etc. ② Swift running. Models can run in real-time. ③ Models and codes are with high reuse rate. (2) MediaPipe Cons ① For mobile devices, MediaPipe will occupy 10M or above. ② As it greatly depends on Tensorflow, you need to alter large amount of codes if you want to change it to other machine learning frameworks, which is not friendly to machine learning developer. ③ It adopts static image which can improve efficiency, but make it difficult to find out the errors. * **How to use MediaPipe** The figure below shows how to use MediaPipe. The solid line represents the part to coded, and the dotted line indicates the part not to coded. MediaPipe can offer the result and the function realization framework quickly.

(1) Dependency MediaPipe utilizes OpenCV to process video, and uses [FFMPEG](https://www.ffmpeg.org/) to process audio data. Furthermore, it incorporates other essential dependencies, including OpenGL/Metal, Tensorflow, and Eigen. For seamless usage of MediaPipe, we suggest gaining a basic understanding of OpenCV. To delve into OpenCV, you can find detailed information in "[**OpenCV Basic Courses**]()". (2) MediaPipe Solutions Solutions is based on the open-source pre-constructed sample of TensorFlow or TFLite. MediaPipe Solutions is built upon a framework, which provides 16 Solutions, including face detection, Face Mesh, iris, hand, posture, human body and so on. * **MediaPipe Learning Resources** MediaPipe website： MediaPipe Wiki： MediaPipe github： Dlib website: dlib github: ### 7.1.2 Image Background Segmentation This lesson provides instructions on utilizing MediaPipe's selfie segmentation model to accurately segment trained models, such as human faces and hands, from their backgrounds. Once separated, you can easily add virtual backgrounds to these models. * **Program Logic** To begin, import the selfie segmentation model from MediaPipe and subscribe to the corresponding topic to access the live camera feed. Next, flip the image and apply the segmentation to the background image. For improved boundary segmentation, implement dual-border segmentation. Finally, complete the process by replacing the background with a virtual background. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Execute the command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Enter the command to start background segmentation ```bash cd ~/ros2_ws/src/example/example/mediapipe_example && python3 self_segmentation.py ``` (6) To close this feature, press the '**Esc**' key to exit the camera image interface. (7) Next, press "**Ctrl+C**" in the terminal. If it fails to close, please try again. * **Program Outcome** Once the feature is activated, the screen changes to a gray virtual background, and when a hand enters the frame, it is automatically separated from the background.

* **Program Analysis** The source code of this program locates in:[/ros2_ws/src/example/example/mediapipe_example/self_segmentation.py](../_static/source_code/self_segmentation.zip)

(1) Function Main: {lineno-start=79} ```python def main(): node = SegmentationNode('self_segmentation') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to start the background control node. (2) Class SegmentationNode： {lineno-start=13} ```python class SegmentationNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.mp_selfie_segmentation = mp.solutions.selfie_segmentation self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.BG_COLOR = (192, 192, 192) # gray self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Init： {lineno-start=14} ```python def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.mp_selfie_segmentation = mp.solutions.selfie_segmentation self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.BG_COLOR = (192, 192, 192) # gray self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize the parameters required for background segmentation, call the image callback function, and start the model recognition function. image_callback： {lineno-start=27} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image callback function, used to read data from the camera node and enqueue it. Main: {lineno-start=37} ```python def main(self): with self.mp_selfie_segmentation.SelfieSegmentation( model_selection=1) as selfie_segmentation: bg_image = None while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue # To improve performance, optionally mark the image as not writeable to # pass by reference. image.flags.writeable = False results = selfie_segmentation.process(image) image.flags.writeable = True image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # Draw selfie segmentation on the background image. # To improve segmentation around boundaries, consider applying a joint # bilateral filter to "results.segmentation_mask" with "image". condition = np.stack( (results.segmentation_mask,) * 3, axis=-1) > 0.1 # The background can be customized. # a) Load an image (with the same width and height of the input image) to # be the background, e.g., bg_image = cv2.imread('/path/to/image/file') # b) Blur the input image by applying image filtering, e.g., # bg_image = cv2.GaussianBlur(image,(55,55),0) if bg_image is None: bg_image = np.zeros(image.shape, dtype=np.uint8) bg_image[:] = self.BG_COLOR output_image = np.where(condition, image, bg_image) self.fps.update() result_image = self.fps.show_fps(output_image) cv2.imshow('MediaPipe Selfie Segmentation', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Load the model from MediaPipe, input the image, and display the output image using OpenCV. ### 7.1.3 3D Object Detection * **Program Logic** To get started, import the 3D Objectron module from MediaPipe, and subscribe to the topic message to receive the real-time camera image. Next, flip the image to ensure proper alignment for 3D object detection. Finally, draw the 3D boundary frame on the image. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Execute the following command to enable the camera node: ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new command-line terminal and hit Enter key to initiate the game program. ```bash cd ~/ros2_ws/src/example/example/mediapipe_example && python3 objectron.py ``` (6) To close this game, press the "**Esc**" key in the image interface to exit the camera image interface. (7) Press "**Ctrl+C**" in the command line terminal interface. If the closure fails, please try again repeatedly. * **Program Outcome** Once the game starts, the 3D frame will be drawn around the boundary of the recognized object. The system can identify several objects, including a cup (with handle), shoe, chair, and camera.

* **Program Analysis** The program file corresponding to this section of the course documentation is located at: [/ros2_ws/src/example/example/mediapipe_example/objectron.py](../_static/source_code/objectron.zip)

(1) Function Main: {lineno-start=74} ```python def main(): node = ObjectronNode('objectron') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to initiate 3D detection node. (2) Class ObjectronNode： {lineno-start=13} ```python class ObjectronNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.mp_objectron = mp.solutions.objectron self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') ``` Init： {lineno-start=14} ```python def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.mp_objectron = mp.solutions.objectron self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize the parameters required for 3D recognition, call the image callback function, and start the model recognition function. image_callback： {lineno-start=26} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image callback function, used to read data from the camera node and enqueue it. Main: {lineno-start=35} ```python def main(self): with self.mp_objectron.Objectron(static_image_mode=False, max_num_objects=1, min_detection_confidence=0.4, min_tracking_confidence=0.5, model_name='Cup') as objectron: while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue # To improve performance, optionally mark the image as not writeable to # pass by reference. image.flags.writeable = False results = objectron.process(image) # Draw the box landmarks on the image. image.flags.writeable = True image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) if results.detected_objects: for detected_object in results.detected_objects: self.mp_drawing.draw_landmarks( image, detected_object.landmarks_2d, self.mp_objectron.BOX_CONNECTIONS) self.mp_drawing.draw_axis(image, detected_object.rotation, detected_object.translation) self.fps.update() result_image = self.fps.show_fps(cv2.flip(image, 1)) # Flip the image horizontally for a selfie-view display. cv2.imshow('MediaPipe Objectron', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Read the model inside MediaPipe, input the image, draw the edges of the objects after obtaining the output image, and display using OpenCV. ### 7.1.4 Face Detection In this lesson, we'll use MediaPipe's face detection model to detect faces in the frame. MediaPipe's face detection is an ultra-fast solution that supports multiple faces and identifies 6 key landmarks. It's built on BlazeFace, a lightweight and highly efficient face detector optimized for mobile GPU inference. * **Program Logic** To begin, import the human face detection model from MediaPipe and subscribe to the relevant topic message to obtain the live camera feed. Next, utilize OpenCV to flip the image and convert the color space for further processing. Then, using the face detection model's minimum confidence threshold, determine whether a human face has been successfully detected. If a human face is recognized, proceed to collect the necessary information about each detected face, including the bounding frame and the 6 key points (right eye, left eye, nose tip, right ear, and left ear). Finally, frame the human face and mark the 6 key points on each detected face for visual clarity and further analysis. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC (2) Click-on

to start the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Execute the command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new command-line terminal and execute the following command. ```bash cd ~/ros2_ws/src/example/example/mediapipe_example && python3 face_detect.py ``` (6) If you need to close this game, you need to press the "**Esc**" key in the image interface to exit the camera image interface. (7) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** After the game starts, depth camera will start detecting human face, and human face will framed on the live camera feed.

* **Program Analysis** The source code of this program is saved in [/ros2_ws/src/example/example/mediapipe_example/face_detect.py](../_static/source_code/face_detect.zip)

(1) Function Main： {lineno-start=65} ```python def main(): node = FaceDetectionNode('face_detection') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to initiate face detection node. (2) Class FaceDetectionNode： {lineno-start=17} ```python class FaceDetectionNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/detector.tflite') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceDetectorOptions(base_options=base_options) self.detector = vision.FaceDetector.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) ``` Init： {lineno-start=20} ```python self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/detector.tflite') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceDetectorOptions(base_options=base_options) self.detector = vision.FaceDetector.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize the parameters required for face recognition, call the image callback function, and start the model recognition function. image_callback： {lineno-start=27} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image callback function, used to read data from the camera node and enqueue it. Main: {lineno-start=41} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = visualize(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('face_detection', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Read the model from MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw the facial keypoints and display the feedback image. ### 7.1.5 3D Face Detection In this program, MediaPipe Face Mesh is utilized to detect human face within the camera image. MediaPipe Face Mesh is a powerful model capable of estimating 468 3D facial features, even when deployed on a mobile device. It employs machine learning to infer the 3D face contour accurately. Additionally, this model ensures real-time detection by utilizing a lightweight model architecture and GPU acceleration. Furthermore, this model is integrated with a face conversion module that compensates for any differences between face landmark estimation and AR (Augmented Reality) applications. It establishes a metric 3D space and utilizes the facial landmark screen positions to estimate facial conversion within this space. The facial conversion data consists of common 3D primitives, including facial gesture conversion matrices and triangle facial mesh information. * **Program Logic** Firstly, you need to learn that machine learning pipeline is composed of two real-time deep neural network models. The system consists of two components: a face detector that processes the entire image and calculates the locations of faces, and a 3D face landmark model that uses these locations to predict an approximate 3D surface through regression. To achieve 3D facial landmarks, we utilize transfer learning and train a network with multiple objectives: predicting 3D landmark coordinates on synthetic rendered data and 2D semantic contours on annotated real-world data simultaneously. This approach yields plausible 3D landmark predictions not only based on synthetic data but also on real-world data. The 3D landmark network takes cropped video frames as input without requiring additional depth input. The model outputs the location of a 3D point and the probability that a face appears in the input and is properly aligned. Once the face mesh model is imported, real-time images can be obtained from the camera by subscribing to topic messages. Next, image preprocessing techniques like flipping the image and converting the color space are applied. The face detection model's minimum confidence is then used to determine whether the face has been successfully detected. Finally, the detected face on the screen is projected into a 3D grid for visualization and display. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Run the following command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Execute the command to enable the function program: ```bash cd ~/ros2_ws/src/example/example/mediapipe_example && python3 face_mesh.py ``` (6) If you need to close this game, you need to press the "**Esc**" key in the image interface to exit the camera image interface. (7) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** After starting the game, when the depth camera detects a face, it will outline the face in the feedback image.

* **Program Analysis** The source code of this program is located in [/ros2_ws/src/example/example/mediapipe_example/face_mesh.py](../_static/source_code/face_mesh.zip)

(1) Function Main： {lineno-start=} ```python def main(): node = FaceMeshNode('face_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to initiate the 3D face detection node. (2) Class FaceMeshNode： {lineno-start=17} ```python class FaceMeshNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/face_landmarker_v2_with_blendshapes.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceLandmarkerOptions(base_options=base_options, output_face_blendshapes=True, ``` Init： {lineno-start=19} ```python rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/face_landmarker_v2_with_blendshapes.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceLandmarkerOptions(base_options=base_options, output_face_blendshapes=True, output_facial_transformation_matrixes=True, num_faces=1) self.detector = vision.FaceLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize the parameters required for 3D face detection, call the image callback function, and start the model recognition function. image_callback： {lineno-start=37} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` The image callback function is used to retrieve data from the camera node and encapsulate it into a queue. Main: {lineno-start=46} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = draw_face_landmarks_on_image(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('face_landmarker', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Reading the model inside MediaPipe, inputting the image, and then using OpenCV to draw facial keypoints on the output image, and display the feedback image. ### 7.1.6 Hand Key Point Detection MediaPipe's hand detection model is employed to showcase the key points of the hand and the connecting lines of these key points on the live camera feed. MediaPipe Hands is an advanced hand and finger detection model that delivers high-fidelity results. Through the power of machine learning (ML), it accurately infers 21 3D landmarks of the hand from a single frame. * **Program Logic** Firstly, it's important to understand that MediaPipe's palm detection model employs a machine learning pipeline consisting of multiple models (including a linear model). This model processes the entire image and returns an oriented hand bounding box. On the other hand, the hand landmark model operates on cropped image regions defined by the palm detectors and provides high-fidelity 3D hand keypoints. To begin, after importing the palm detection model, we subscribe to the topic message to obtain real-time camera images. Next, various image pre-processing steps, such as flipping the image and converting the color space, are applied. These steps significantly reduce the need for data augmentation for the hand landmark model. Furthermore, our pipeline allows for generating crops based on hand landmarks identified in the previous frame. The palm detection is invoked only when the landmark model is unable to recognize the presence of the hand accurately. Afterward, by comparing the minimum confidence level of the hand detection model, we determine whether the palm has been successfully detected. Lastly, the hand keypoints are detected and drawn on the camera image to visualize the detected hand in real-time. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

* **Program Analysis**

The program file corresponding to this section of the course documentation is located at: [/ros2_ws/src/example/example/mediapipe_example/hand.py](../_static/source_code/hand.zip) (1) Function **Main：** {lineno-start=64} ```python def main(): node = HandNode('hand_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to initiate the 3D face detection node. (2) Class HandNode： {lineno-start=17} ```python class HandNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/hand_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=2) self.detector = vision.HandLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') ``` Init： {lineno-start=19} ```python rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/hand_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=2) self.detector = vision.HandLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Initialize the parameters required for hand keypoint detection, call the image callback function, and start the model recognition function. image_callback： {lineno-start=33} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` The image callback function is used to read data from the camera node and enqueue it. Main: {lineno-start=42} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = draw_hand_landmarks_on_image(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('hand_landmarker', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Read the model from MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw the key points of the hand and display the feedback image. ### 7.1.7 Body Key Points Detection The MediaPipe body detection model is utilized to detect key points on the human body, which are then displayed on the live camera feed. This implementation incorporates MediaPipe Pose, a high-fidelity posture tracking model that leverages BlazePose to infer 33 3D key points. Additionally, this approach offers support for the ML Kit Pose Detection API.

* **Program Logic** Firstly, import body detection model. Subsequently, flip over the image and convert the color space of the image. Check whether the human body is successfully detected based on the minimum confidence of the body detection model. Next, define the tracked posture by comparing the minimum tracking confidence. If the confidence does not meet the minimum threshold, perform automatic human detection on the next input image. In the pipeline, a detector is employed to initially localize the region of interest (ROI) corresponding to a person's pose within a frame. Subsequently, a tracker utilizes the cropped ROI frame as input to predict pose landmarks and segmentation masks within the ROI. For video applications, the detector is invoked selectively, only when necessary. Specifically, it is used for the first frame or when the tracker fails to recognize the body pose in the preceding frame. In all other frames, the pipeline derives ROIs based on the pose landmarks detected in the previous frame. After MediaPipe body detection model is imported, access the live camera feed through subscribing the related topic message. Lastly, draw the key points representing the human body. * **Operation Steps** (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

* **Program Analysis** The program file is saved in:[/ros2_ws/src/example/example/mediapipe_example/pose.py](../_static/source_code/pose.zip)

(1) Function Main： {lineno-start=65} ```python def main(): node = PoseNode('pose_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` Used to initiate the 3D face detection node. (2) Class PoseNode： {lineno-start=17} ```python class PoseNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/pose_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.PoseLandmarkerOptions( base_options=base_options, output_segmentation_masks=True) self.detector = vision.PoseLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) ``` Init： {lineno-start=18} ```python def __init__(self, name): rclpy.init() super().__init__(name) self.running = True model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/pose_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.PoseLandmarkerOptions( base_options=base_options, output_segmentation_masks=True) self.detector = vision.PoseLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize the parameters required for limb detection, call the image callback function, and start the model recognition process. image_callback： {lineno-start=34} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image callback function, used to read data from the camera node and enqueue it. Main: {lineno-start=43} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = draw_pose_landmarks_on_image(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('pose_landmarker', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 break cv2.destroyAllWindows() rclpy.shutdown() ``` Read the model inside MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw facial keypoints and display the feedback image.

### 7.1.8 Fingertip Trajectory Recognition Identify hand joints using MediaPipe's hand detection model. Once a specific gesture is recognized, the robot will initiate fingertip locking on the screen, track the fingertips, and generate their movement trajectory. * **Program Logic** First, invoke the MediaPipe hand detection model to capture the camera image. Next, flip and process the image to extract hand information. Utilizing the connection lines between key points of the hand, calculate the finger angles to determine the gesture. Upon recognition of a specific gesture, the robot will proceed to identify and lock the fingertips on the screen, simultaneously tracing the movement trajectory of the fingertips on the display. * **Operation Steps** :::{Note} The input command should be case sensitive, and the keyword can be complemented by "**Tab**" key. ::: (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to start the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Execute the command to enable the camera node: ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) In a new command terminal, enter the below command and press Enter to run the program: ```bash cd ~/ros2_ws/src/example/example/mediapipe_example && python3 hand_gesture.py ``` (6) The program will launch the camera's image interface. For detailed recognition steps, please refer to [**7.1.8 Fingertip Trajectory Recognition->Program Outcome**](#anchor_7_1_8). (7) To close this function, press the "**Esc**" key in the image interface to exit the camera view. (8) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** Once the game starts, position your hand within the camera's field of view. Upon recognition, the hand keypoints will be highlighted on the camera feed. If the robot detects the **"1"** gesture, the trajectory of your fingertip motion will begin to be recorded on the camera feed. If it detects the **"5"** gesture, the recorded fingertip trajectory will be cleared.

* **Program Analysis** The program file is saved in [/ros2_ws/src/example/example/mediapipe_example/hand_gesture.py](../_static/source_code/hand_gesture.zip)

:::{Note} Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable! ::: Based on the game's impact, the process logic of this game is organized as depicted in the figure below:

As depicted in the image above, the purpose of this game is to capture an image using the camera, preprocess it by converting its color space for easier identification, extract feature points corresponding to hand gestures from the converted image, and determine different gestures (based on angles) through logical analysis of key feature points. Finally, the trajectory of the recognized gesture is drawn on the display screen. The program's logic flowchart extracted from the program files is illustrated in the figure below.

From the above diagram, it can be seen that the program's logical flow is mainly divided into initialization functions and recognition processing functions. The following document content will be written according to the program logic flow chart mentioned above. (1) Function Main: {lineno-start=209} ```python def main(): node = HandGestureNode('hand_gesture') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') ``` The main function is used to start the fingertip trajectory recognition node. get_hand_landmarks: {lineno-start=18} ```python def get_hand_landmarks(img, landmarks): """ 将landmarks从medipipe的归一化输出转为像素坐标 :param img: 像素坐标对应的图片 :param landmarks: 归一化的关键点 :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) ``` Convert the normalized data from madipipe into pixel coordinates. hand_angle： {lineno-start=29} ```python def hand_angle(landmarks): """ 计算各个手指的弯曲角度 :param landmarks: 手部关键点 :return: 各个手指的角度 """ angle_list = [] # thumb 大拇指 angle_ = vector_2d_angle(landmarks[3] - landmarks[4], landmarks[0] - landmarks[2]) angle_list.append(angle_) # index 食指 angle_ = vector_2d_angle(landmarks[0] - landmarks[6], landmarks[7] - landmarks[8]) angle_list.append(angle_) # middle 中指 angle_ = vector_2d_angle(landmarks[0] - landmarks[10], landmarks[11] - landmarks[12]) angle_list.append(angle_) # ring 无名指 angle_ = vector_2d_angle(landmarks[0] - landmarks[14], landmarks[15] - landmarks[16]) angle_list.append(angle_) # pink 小拇指 angle_ = vector_2d_angle(landmarks[0] - landmarks[18], landmarks[19] - landmarks[20]) angle_list.append(angle_) angle_list = [abs(a) for a in angle_list] return angle_list ``` After extracting the hand feature points into the 'results' variable, it is necessary to logically process these points. By evaluating the angular relationship between the feature points, specific finger types (thumb, index finger) can be identified. The 'hand_angle' function accepts the landmark feature point set (results) as input, and subsequently employs the 'vector_2d_angle' function to compute the angles between the corresponding feature points. The feature points corresponding to the elements of the landmark set are depicted in the figure below:

Taking the thumb's angle calculation as an example: the vector_2d_angle function is used to calculate the angle between joint points. landmarks\[3\], landmarks\[4\], landmarks\[0\], and landmarks\[2\] correspond to feature points 3, 4, 0, and 2 in the hand feature extraction diagram. By calculating the angles of these joint points, the thumb's posture characteristics can be determined. Similarly, the processing logic for the other finger joints is analogous. To ensure the accuracy of recognition, the parameters and basic logic (addition and subtraction of angle calculations) in the hand_angle function should remain at their default settings. h_gesture： {lineno-start=54} ```python def h_gesture(angle_list): """ 通过二维特征确定手指所摆出的手势 :param angle_list: 各个手指弯曲的角度 :return : 手势名称字符串 """ thr_angle = 65. thr_angle_thumb = 53. thr_angle_s = 49. gesture_str = "none" if (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "fist" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "hand_heart" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s): gesture_str = "nico-nico-ni" elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "hand_heart" elif (angle_list[0] > 5) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "one" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "two" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] > thr_angle): gesture_str = "three" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "OK" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "four" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "five" elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s): gesture_str = "six" else: "none" return gesture_str ``` After identifying the different finger types of the hand and determining their positions on the image, logical recognition processing of various gestures can be performed by implementing the 'h_gesture' function. In the `h_gesture` function depicted above, the parameters `thr_angle`, `thr_angle_thenum`, and `thr_angle_s` represent the angle threshold values for corresponding gesture logic points. These values have been empirically tested to ensure stable recognition effects. It is not recommended to alter them unless the logic processing effect is unsatisfactory, in which case adjustments within a range of ±5 values are sufficient. The `angle_list[0,1,2,3,4]` corresponds to the five finger types associated with the palm. Here's an example using the gesture "**one**":

The code presented represents the logical angle evaluation of the fingers for the "**one**" gesture. `angle_list[0]>5` checks whether the angle value of the thumb joint feature point in the image is greater than 5. `angle_list[1] 2: for i, p in enumerate(points): if i + 1 >= len(points): break cv2.line(img, p, points[i + 1], color, thickness) ``` Draw the currently recognized hand shape and each joint point. (2) Class State: {lineno-start=101} ```python class State(enum.Enum): NULL = 0 START = 1 TRACKING = 2 RUNNING = 3 ``` An enumeration class used to set the current state of the program. HandGestureNode： {lineno-start=107} ```python def draw_points(img, points, thickness=4, color=(255, 0, 0)): points = np.array(points).astype(dtype=np.int64) if len(points) > 2: for i, p in enumerate(points): if i + 1 >= len(points): break cv2.line(img, p, points[i + 1], color, thickness) class HandGestureNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.drawing = mp.solutions.drawing_utils self.hand_detector = mp.solutions.hands.Hands( static_image_mode=False, max_num_hands=1, min_tracking_confidence=0.05, min_detection_confidence=0.6 ) self.fps = fps.FPS() # fps计算器 self.state = State.NULL self.points = [] self.count = 0 self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` The HandGestureNode is a fingertip trajectory recognition node that contains three functions: an initialization function, a main function, and an image callback function. Init: {lineno-start=116} ```python def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.drawing = mp.solutions.drawing_utils self.hand_detector = mp.solutions.hands.Hands( static_image_mode=False, max_num_hands=1, min_tracking_confidence=0.05, min_detection_confidence=0.6 ) self.fps = fps.FPS() # fps计算器 self.state = State.NULL self.points = [] self.count = 0 self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() ``` Initialize each component needed and call the camera node.

### 7.1.9 Posture Control The human posture estimation model, trained using the MediaPipe machine learning framework, detects the human body feature in the captured image, identifies relevant joint positions, and subsequently recognizes a variety of sequential actions. This process enables direct control of the robot through somatosensory input. Viewed from the perspective of the robot, the following actions correspond to specific movements: ① If the user lifts their left arm, the robot will move a certain distance to the right. ② If the user lifts their right arm, the robot will move a certain distance to the left. ③ If the user lifts their left leg, the robot will move forward a certain distance. ④ If the user lifts their right leg, the robot will move backward a certain distance. * **Program Logic** First, import MediaPipe's human pose estimation model and subscribe to topic messages to obtain real-time footage from the camera. MediaPipe is an open-source multimedia machine learning model application framework that runs cross-platform on mobile devices, workstations, and servers. It supports mobile GPU acceleration and inference engines such as TensorFlow and TF Lite. Next, utilize the built model to detect key points of the human body in the screen. Connect these key points to display the human body and determine the human body posture. Finally, if a specific action is detected in the human body posture, the robot will respond accordingly. * **Operation Steps** :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to start the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Execute the command to run the game program: ```bash ros2 launch example body_control.launch.py ``` (5) If you need to close this gameplay, you need to press the "**Esc**" key in the image interface to exit the camera image interface. (6) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** Once the game is initiated, stand within the camera's field of view. When a person is detected, the screen will display key points of the body and lines connecting them. From the perspective of the robot, lifting the left arm will cause the robot to turn left; lifting the right arm will make the robot turn right; lifting the left leg will make the robot move forward a certain distance; lifting the right leg will make the robot move backward a certain distance.

* **Program Analysis** The program file is saved in [ros2_ws/src/example/example/body_control/include/body_control.py](../_static/source_code/body_control.zip)

① Capture an image through the camera. ② After performing a demonstration action, the car will execute the corresponding action. ③ From the car's perspective, lifting the left arm will cause the car to turn left; lifting the right arm will cause the car to turn right in a circle; lifting the left leg will make the car move forward a certain distance; lifting the right leg will make the car move backward a certain distance. The program's logic flowchart, obtained from the program files, is presented below:

① Initialization function (init(self.name)) defines relevant parameters, including: * Definition of the image tool (self.drawing) object. * Points used to draw recognized features. * Definition of the limb detection object (self.body_detector). ② Identified feature points' output results undergo logical processing for recognition. ③ Actions are determined and stored based on key point distance conditions. ④ Finally, the output results are generated, and the car executes corresponding actions. (1) Function Main： {lineno-start=307} ```python def main(): node = BodyControlNode('body_control') rclpy.spin(node) node.destroy_node() ``` Used to start the body sensation control node. get_joint_landmarks： {lineno-start=45} ```python def get_joint_landmarks(img, landmarks): """ 将landmarks从medipipe的归一化输出转为像素坐标(Convert landmarks from medipipe's normalized output to pixel coordinates) :param img: 像素坐标对应的图片(picture corresponding to pixel coordinate) :param landmarks: 归一化的关键点(normalized keypoint) :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) ``` Used to convert the recognized information into pixel coordinates. joint_distance： {lineno-start=56} ```python def joint_distance(landmarks): distance_list = [] d1 = landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER] d2 = landmarks[LEFT_HIP] - landmarks[LEFT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER] d2 = landmarks[RIGHT_HIP] - landmarks[RIGHT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[LEFT_HIP] - landmarks[LEFT_ANKLE] d2 = landmarks[LEFT_ANKLE] - landmarks[LEFT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_ANKLE] d2 = landmarks[RIGHT_ANKLE] - landmarks[RIGHT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) return distance_list ``` Used to calculate the distance between each joint point based on pixel coordinates. (2) Class {lineno-start=85} ```python class BodyControlNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.running = True self.fps = fps.FPS() # fps计算器 signal.signal(signal.SIGINT, self.shutdown) self.move_finish = True self.stop_flag = False self.left_hand_count = [] self.right_hand_count = [] self.left_leg_count = [] self.right_leg_count = [] self.detect_status = [0, 0, 0, 0] ``` This class is the body control node. Init: {lineno-start=86} ```python def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.running = True self.fps = fps.FPS() # fps计算器 signal.signal(signal.SIGINT, self.shutdown) ``` Initialize the parameters required for body control, read the image callback node from the camera, initialize nodes such as servos, chassis, buzzers, motors, etc., and finally start the main function within the class.4 get_node_state： {lineno-start=131} ```python def get_node_state(self, request, response): response.success = True return response ``` Set the initialization state of the current node. shutdown： {lineno-start=135} ```python def shutdown(self, signum, frame): self.running = False ``` Program exit callback function used to terminate recognition. image_callback： {lineno-start=138} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 将自定义图像消息转化为图像 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image node callback function used to process images and enqueue them. Move: {lineno-start=146} ```python def move(self, *args): if args[0].angular.z == 1: set_servos(self.joints_pub, 0.1, ((9, 650), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = -0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = 1 self.motor_pub.publish([motor1, motor2]) time.sleep(7.5) set_servos(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0 self.motor_pub.publish([motor1, motor2]) elif args[0].angular.z == -1: set_servos(self.joints_pub, 0.1, ((9, 350), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = -1 self.motor_pub.publish([motor1, motor2]) time.sleep(8) set_servos(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0 self.motor_pub.publish([motor1, motor2]) else: self.mecanum_pub.publish(args[0]) time.sleep(args[1]) self.mecanum_pub.publish(Twist()) time.sleep(0.1) self.stop_flag =True self.move_finish = True ``` Movement strategy function that moves the vehicle according to the recognized limb actions. buzzer_warn: {lineno-start=193} ```python def buzzer_warn(self): msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) ``` Buzzer control function used for buzzer alarms. image_proc: {lineno-start=201} ```python def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks is not None: if self.move_finish: twist = Twist() landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) distance_list = (joint_distance(landmarks)) if distance_list[0] < 1: self.detect_status[0] = 1 if distance_list[1] < 1: self.detect_status[1] = 1 ``` Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information, and then performs movements according to the recognized posture. Main: {lineno-start=283} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 self.mecanum_pub.publish(Twist()) self.running = False ``` The main function within the BodyControlNode class, used to input image information into the recognition function and display the returned image. ### 7.1.10 Integration of Body Posture and RGB Control The depth camera combines RGB capabilities, enabling both color recognition and somatosensory control. This lesson will leverage color recognition, as discussed in "[7.1.9 Posture Control](#anchor_7_1_9)", to identify individuals wearing clothing of a predefined color (which can be calibrated through operations). The robot's movements are then controlled based on different body gestures. If an individual wearing the specified color is not recognized, the robot remains unresponsive, ensuring accurate identification and control over the individual operating the robot. * **Program Logic** First, import MediaPipe's human pose estimation model and subscribe to topic messages to obtain real-time camera footage. Next, utilize the built model to detect key points of the human torso on the screen. Connect these key points to display the human torso and determine the human posture. Calibrate the center point of the human body based on all key points. Finally, if it is detected that the human body is standing with hands on hips, calibrate the color of the clothes to identify the control object. The robot enters control mode, and when the human body performs specific actions, the robot responds with corresponding actions. * **Operation Steps** :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to start the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Run the following command and hit Enter key to initiate the game: ```bash ros2 launch example body_and_rgb_control.launch.py ``` (5) If you need to close this game you need to press the "**Esc**" key in the image interface to exit the camera image interface. (6) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** After starting the game, stand within the camera's field of view. When a person is detected, the screen will display key points of the human torso, lines connecting these points, and the center point of the human body. **First Step:** Slightly adjust the camera to maintain a certain distance, ensuring it can detect the entire human body. **Second Step:** When the person to be controlled appears on the camera screen, they can assume the posture of hands on hips. If the buzzer emits a short beep, the robot completes calibration of the human body's center point and clothing color, entering control mode.

**Third Step:** At this point, with the robot as the primary viewpoint, raising the left arm causes the robot to move to the right; raising the right arm causes the robot to move to the left; raising the left leg causes the robot to move forward; raising the right leg causes the robot to move backward.

**Fourth Step:** If a person wearing a different clothing color enters the camera's field of view, they will not control the robot.

* **Program Analysis** The program file is saved in: [ros2_ws/src/example/example/body_control/include/body_and_rgb_control.py](../_static/source_code/body_and_rgb_control.zip)

:::{Note} Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable! ::: Based on the game's effectiveness, the procedural logic is delineated as follows:

Retrieve the captured image from the camera, analyze the key characteristics of the human body, initially detect and assess the "**akimbo**" posture using the designated function. Subsequently, determine if it's the same individual in the picture based on clothing color. If confirmed, identify specific body movements (raising the left arm, right arm, left leg, and right leg) and instruct the car to execute corresponding actions. Otherwise, reanalyze the key feature points. The program's logic flowchart, derived from the program files, is illustrated below:

As shown in the above diagram, the program first uses the initialization function of the BodyControlNode class to set the default values for the relevant parameters. This primarily includes defining the drawing tool object, the limb detection object, and initializing the posture state and count parameters. After this setup, the program can logically process the incoming images. It begins by identifying and outputting key body feature points, calculating joint angles from the obtained (arm) parameters to mark the hands-on-hips posture. Then, it matches colors based on the recognized key feature points to determine if the person has been previously marked. Finally, the program controls the movement of the vehicle by recognizing demonstration actions (raising an arm, lifting a leg). * **Function** Main： {lineno-start=400} ```python def main(): node = BodyControlNode('body_control') rclpy.spin(node) node.destroy_node() ``` Used to start the RGB body sensation control node. get_body_center： {lineno-start=47} ```python def get_body_center(h, w, landmarks): landmarks = np.array([(lm.x * w, lm.y * h) for lm in landmarks]) center = ((landmarks[LEFT_HIP] + landmarks[LEFT_SHOULDER] + landmarks[RIGHT_HIP] + landmarks[RIGHT_SHOULDER])/4).astype(int) return center.tolist() ``` Used to obtain the currently recognized body contours. get_joint_landmarks： {lineno-start=52} ```python def get_joint_landmarks(img, landmarks): """ 将landmarks从medipipe的归一化输出转为像素坐标(Convert landmarks from medipipe's normalized output to pixel coordinates) :param img: 像素坐标对应的图片(picture corresponding to pixel coordinate) :param landmarks: 归一化的关键点(normalized keypoint) """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) ``` Used to convert the recognized information into pixel coordinates. get_dif： {lineno-start=62} ```python def get_dif(list1, list2): if len(list1) != len(list2): return 255*3 else: d = np.absolute(np.array(list1) - np.array(list2)) return sum(d) ``` Used to compare the color of clothes on the body contours. joint_angle: {lineno-start=69} ```python def joint_angle(landmarks): """ 计算各个关节弯曲角度(calculate flex angle of each joint) :param landmarks: 手部关键点(hand keypoints) :return: 关节角度(joint angle) """ angle_list = [] left_hand_angle1 = vector_2d_angle(landmarks[LEFT_SHOULDER] - landmarks[LEFT_ELBOW], landmarks[LEFT_WRIST] - landmarks[LEFT_ELBOW]) angle_list.append(int(left_hand_angle1)) left_hand_angle2 = vector_2d_angle(landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER], landmarks[LEFT_WRIST] - landmarks[LEFT_SHOULDER]) angle_list.append(int(left_hand_angle2)) right_hand_angle1 = vector_2d_angle(landmarks[RIGHT_SHOULDER] - landmarks[RIGHT_ELBOW], landmarks[RIGHT_WRIST] - landmarks[RIGHT_ELBOW]) angle_list.append(int(right_hand_angle1)) right_hand_angle2 = vector_2d_angle(landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER], landmarks[RIGHT_WRIST] - landmarks[RIGHT_SHOULDER]) angle_list.append(int(right_hand_angle2)) return angle_list ``` This function is used to calculate the recognition angles of various joints between the body parts. joint_distance: {lineno-start=90} ```python def joint_distance(landmarks): distance_list = [] d1 = landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER] d2 = landmarks[LEFT_HIP] - landmarks[LEFT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER] d2 = landmarks[RIGHT_HIP] - landmarks[RIGHT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[LEFT_HIP] - landmarks[LEFT_ANKLE] d2 = landmarks[LEFT_ANKLE] - landmarks[LEFT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_ANKLE] d2 = landmarks[RIGHT_ANKLE] - landmarks[RIGHT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) return distance_list ``` This function is used to calculate the distance between each joint point based on pixel coordinates. * **Class** {lineno-start=119} ```python class BodyControlNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.color_picker = ColorPicker(Point(), 2) signal.signal(signal.SIGINT, self.shutdown) self.fps = fps.FPS() # fps计算器 self.running = True self.current_color = None self.lock_color = None self.calibrating = False self.move_finish = True self.stop_flag = False self.count_akimbo = 0 self.count_no_akimbo = 0 self.can_control = False ``` This class is the body control node. Init: {lineno-start=120} ```python def __init__(self, name): rclpy.init() super().__init__(name) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.color_picker = ColorPicker(Point(), 2) signal.signal(signal.SIGINT, self.shutdown) self.fps = fps.FPS() # fps计算器 ``` Initialize the parameters required for body control, read the camera's image callback node, initialize nodes such as servos, chassis, buzzers, motors, etc., and finally start the main function within the class. get_node_state: {lineno-start=173} ```python def get_node_state(self, request, response): response.success = True return response ``` Set the initialization state of the current node. shutdown: {lineno-start=177} ```python def shutdown(self, signum, frame): self.running = False ``` Program exit callback function used to terminate recognition. image_callback: {lineno-start=180} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 将自定义图像消息转化为图像 if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image node callback function used to process images and enqueue them. Move: {lineno-start=} ```python def move(self, *args): if args[0].angular.z == 1: set_servos(self.joints_pub, 0.1, ((9, 650), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = -0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = 1 self.motor_pub.publish([motor1, motor2]) time.sleep(7.5) set_servos(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0 self.motor_pub.publish([motor1, motor2]) elif args[0].angular.z == -1: set_servos(self.joints_pub, 0.1, ((9, 350), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = -1 self.motor_pub.publish([motor1, motor2]) time.sleep(8) set_servos(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0 self.motor_pub.publish([motor1, motor2]) else: self.mecanum_pub.publish(args[0]) time.sleep(args[1]) self.mecanum_pub.publish(Twist()) time.sleep(0.1) self.stop_flag =True self.move_finish = True ``` Movement strategy function that moves the vehicle according to the recognized limb actions. buzzer_warn: {lineno-start=235} ```python def buzzer_warn(self): msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) ``` Buzzer control function used for buzzer alarms. image_proc: {lineno-start=243} ```python def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks is not None: twist = Twist() landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) # 叉腰标定 angle_list = joint_angle(landmarks) #print(angle_list) if -150 < angle_list[0] < -90 and -30 < angle_list[1] < -10 and 90 < angle_list[2] < 150 and 10 < angle_list[3] < 30: self.count_akimbo += 1 # 叉腰检测+1(hands-on-hips detection+1) self.count_no_akimbo = 0 # 没有叉腰检测归零(clear no hands-on-hips detection) else: self.count_akimbo = 0 # 叉腰检测归零(clear hands-on-hips detection) ``` Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information. Then, it performs color recognition based on the position of each limb's different contours, finally determining and moving according to the recognized posture. Main: {lineno-start=377} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 self.mecanum_pub.publish(Twist()) self.running = False rclpy.shutdown() ``` The main function within the BodyControlNode class, used to input image information into the recognition function and display the returned image. ### 7.1.11 Pose Detection Through the human pose estimation model in the MediaPipe machine learning framework, the human body posture is detected. When the robot detects a person falling, it will sound an alarm and sway from side to side. * **Program Logic** First, import the human pose estimation model from MediaPipe and subscribe to topic messages to obtain real-time footage from the camera. Then, process the image, such as flipping, to detect human body information in the image. Based on the lines connecting the key points of the human body, calculate the limb height to determine the body movement. Finally, if "**falling**" is detected, the robot will sound an alarm and move forwards and backwards. * **Operation Steps** :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: (1) Start the robot, and enter the robot system desktop using VNC. (2) Click-on

to start the command-line terminal. (3) Run the command to disable app auto-start app service. ```bash ~/.stop_ros.sh ``` (4) Run the following command and hit Enter key to initiate the game: ```bash ros2 launch example fall_down_detect.launch.py ``` (5) If you need to close this game, you need to press the "**Esc**" key in the image interface to exit the camera image interface. (6) Then press "**Ctrl+C**" in the command line terminal interface. If closing fails, please try again. * **Program Outcome** Once the game starts, ensure the human body remains as fully within the camera's field of view as possible. Upon recognizing the human body, the key points will be highlighted in the returned image. At this point, the individual can sit down briefly. Upon detecting the "**falling**" posture, the robot will continuously sound an alarm and make repeated forward and backward movements as a reminder.

* **Program Analysis** The program file is saved in [ros2_ws/src/example/example/body_control/include/fall_down_detect.py](../_static/source_code/fall_down_detect.zip)

:::{Note} Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable! ::: Based on the game's effectiveness, the procedural logic is delineated as follows:

The car captures images via the camera, identifies the key feature points of the human body, and assesses whether the current posture indicates a "**fall**". If a fall is detected, the car's buzzer will emit a continuous "**beep**" sound while the car moves backward. Otherwise, the buzzer will only emit a single "**beep**" sound. The program logic flow chart obtained from the program files is depicted in the figure below.

(1) Function Main： {lineno-start=202} ```python def main(): node = FallDownDetectNode('fall_down_detect') rclpy.spin(node) node.destroy_node() ``` Used to start the body sensation control node. get_joint_landmarks: {lineno-start=44} ```python def get_joint_landmarks(img, landmarks): """ 将landmarks从medipipe的归一化输出转为像素坐标(Convert landmarks from medipipe's normalized output to pixel coordinates) :param img: 像素坐标对应的图片(picture corresponding to pixel coordinate) :param landmarks: 归一化的关键点(normalized keypoint) :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) ``` Used to convert the recognized information into pixel coordinates. height_cal: {lineno-start=55} ```python def height_cal(landmarks): y = [] for i in landmarks: y.append(i[1]) height = sum(y)/len(y) return height ``` Calculates the height of the limbs based on the recognized information. (2) Class {lineno-start=63} ```python class FallDownDetectNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.running = True self.fps = fps.FPS() # fps计算器 self.fall_down_count = [] self.move_finish = True self.stop_flag = False signal.signal(signal.SIGINT, self.shutdown) self.image_queue = queue.Queue(maxsize=2) ``` This class is the fall detection node. Init: {lineno-start=63} ```python class FallDownDetectNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.5, min_detection_confidence=0.5) self.running = True self.fps = fps.FPS() # fps计算器 self.fall_down_count = [] ``` Initialize the parameters required for body control, read the camera's image callback node, initialize nodes such as chassis, buzzers, and others, and finally start the main function within the class. get_node_state: {lineno-start=97} ```python def get_node_state(self, request, response): response.success = True return response ``` Set the initialization state of the current node. shutdown: {lineno-start=101} ```python def shutdown(self, signum, frame): self.running = False ``` Program exit callback function used to terminate recognition. image_callback: {lineno-start=104} ```python def image_callback(self, ros_image): rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 将自定义图像消息转化为图像(convert the custom image message into image)age)) if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image node callback function used to process images and enqueue them. Move: {lineno-start=112} ```python def move(self): for i in range(5): twist = Twist() twist.linear.x = 0.2 self.mecanum_pub.publish(twist) time.sleep(0.2) twist = Twist() twist.linear.x = -0.2 self.mecanum_pub.publish(twist) time.sleep(0.2) self.mecanum_pub.publish(Twist()) self.stop_flag =True self.move_finish = True ``` Movement strategy function that moves the vehicle according to the recognized limb height. buzzer_warn: {lineno-start=126} ```python def buzzer_warn(self): if not self.stop_flag: while not self.stop_flag: msg = BuzzerState() msg.freq = 1000 msg.on_time = 0.1 msg.off_time = 0.1 msg.repeat = 1 self.buzzer_pub.publish(msg) time.sleep(0.2) else: msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) ``` Buzzer control function used for buzzer alarms. image_proc: {lineno-start=144} ```python def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks: if self.move_finish: landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) h = height_cal(landmarks) if h > 240: self.fall_down_count.append(1) else: self.fall_down_count.append(0) if len(self.fall_down_count) == 3: count = sum(self.fall_down_count) ``` Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information, and moves according to the recognized height. Main: {lineno-start=178} ```python def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 self.mecanum_pub.publish(Twist()) self.running = False ``` The main function within the FallDownDetectNode class, used to input image information into the recognition function and display the returned image.

## 7.2 Machine Learning ### 7.2.1 Machine Learning Introduction * **What "Machine Learning" is** Machine Learning forms the cornerstone of artificial intelligence, serving as the fundamental approach to endow machines with intelligence. It spans multiple interdisciplinary fields such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory.

In essence, machine learning explores how computers can acquire new knowledge or skills by mimicking human learning behaviors and continuously enhancing their performance by reorganizing existing knowledge structures. Practically, it entails utilizing data to train models and leveraging these models for predictions. For instance, consider AlphaGo, the pioneering artificial intelligence system that triumphed over human professional Go players and even world champions. AlphaGo operates on the principles of deep learning, wherein it discerns the intrinsic laws and representation layers within sample data to extract meaningful insights. * **Types of Machine Learning** Machine learning can be broadly categorized into two types: supervised learning and unsupervised learning. The key distinction between these two types lies in whether the machine learning algorithm has prior knowledge of the classification and structure of the dataset. (1) Supervised Learning Supervised learning involves providing a labeled dataset to the algorithm, where the correct answers are known. The machine learning algorithm uses this dataset to learn how to compute the correct answers. It is the most commonly used type of machine learning. For instance, in image recognition, a large dataset of dog pictures can be provided, with each picture labeled as "**dog**". This labeled dataset serves as the "**correct answer**". By learning from this dataset, the machine can develop the ability to recognize dogs in new images.

Model Selection: In supervised learning, selecting the right model to represent the data relationship is crucial. Common supervised learning models encompass linear regression, logistic regression, decision trees, support vector machines (SVM), and deep neural networks. The choice of model hinges on the data's characteristics and the problem's nature. Feature Engineering: Feature engineering involves preprocessing and transforming raw data to extract valuable features. This encompasses tasks like data cleaning, handling missing values, normalization or standardization, feature selection, and feature transformation. Effective feature engineering can significantly enhance model performance and generalization capabilities. Training and Optimization: Leveraging labeled training data, we can train the model to capture the data relationship. Training typically involves defining a loss function, selecting an appropriate optimization algorithm, and iteratively adjusting model parameters to minimize the loss function. Common optimization algorithms include gradient descent and stochastic gradient descent. Model Evaluation: Upon completing training, evaluating the model's performance on new data is essential. Standard evaluation metrics include accuracy, precision, recall, F1 score, and ROC curve. Assessing a model's performance enables us to gauge its suitability for practical applications. In summary, supervised learning entails utilizing labeled training data to train a model for predicting or classifying new unlabeled data. Key steps encompass selecting an appropriate model, conducting feature engineering, training and optimizing the model, and evaluating its performance. Together, these components constitute the foundational elements of supervised learning. (2) Unsupervised Learning Unsupervised learning involves providing an unlabeled dataset to the algorithm, where the correct answers are unknown. In this type of machine learning, the machine must mine potential structural relationships within the dataset. For instance, in image classification, a large dataset of cat and dog pictures can be provided without any labels. Through unsupervised learning, the machine can learn to divide the pictures into two categories: cat pictures and dog pictures.

### 7.2.2 Common Type of Machine Learning Framework There are a large variety of machine learning frameworks. Among them, PyTorch, Tensorflow, MXNet and paddlepaddle are common. * **PyTorch** PyTorch is a powerful open-source machine learning framework, originally based on the BSD License Torch framework. It supports advanced multidimensional array operations and is widely used in the field of machine learning. PyTorch, built on top of Torch, offers even greater flexibility and functionality. One of its most distinguishing features is its support for dynamic computational graphs and its Python interface. In contrast to TensorFlow's static computation graph, PyTorch's computation graph is dynamic. This allows for real-time modifications to the graph as computational needs change. Additionally, PyTorch enables developers to accelerate tensor calculations using GPUs, create dynamic computational graphs, and automatically calculate gradients. This makes PyTorch an ideal choice for machine learning tasks that require flexibility, speed, and powerful computing capabilities.

* **Tensorflow** TensorFlow is a powerful open-source machine learning framework that allows users to quickly construct neural networks and train, evaluate, and save them. It provides an easy and efficient way to implement machine learning and deep learning concepts. TensorFlow combines computational algebra with optimization techniques to make the calculation of many mathematical expressions easier. One of TensorFlow's key strengths is its ability to run on machines of varying sizes and types, including supercomputers, embedded systems, and everything in between. TensorFlow can also utilize both CPU and GPU computing resources, making it an extremely versatile platform. When it comes to industrial deployment, TensorFlow is often the most suitable machine learning framework due to its robustness and reliability. In other words, TensorFlow is an excellent choice for deploying machine learning applications in a production environment. * **PaddlePaddle** PaddlePaddle is a cutting-edge deep learning framework developed by Baidu, which integrates years of research and practical experience in deep learning. PaddlePaddle offers a comprehensive set of features, including training and inference frameworks, model libraries, end-to-end development kits, and a variety of useful tool components. It is the first open-source, industry-level deep learning platform to be developed in China, offering rich and powerful features to developers worldwide. Deep learning has proven to be a powerful tool in many machine learning applications in recent years. From image recognition and speech recognition to natural language processing, robotics, online advertising, automatic medical diagnosis, and finance, deep learning has revolutionized the way we approach these fields. With PaddlePaddle, developers can harness the power of deep learning to create innovative and cutting-edge applications that meet the needs of users and businesses alike.

* **MXNet** MXNet is a top-tier deep learning framework that supports multiple programming languages, including Python, C++, Scala, R, and more. It features a dataflow graph similar to other leading frameworks like TensorFlow and Theano, as well as advanced features such as robust multi-GPU support and high-level model building blocks comparable to Lasagne and Blocks. MXNet can run on virtually any hardware, including mobile phones, making it a versatile choice for developers.

MXNet is specifically designed for efficiency and flexibility, with accelerated libraries that enable developers to leverage the full power of GPUs and cloud computing. It also supports distributed computing across dynamic cloud architectures via distributed parameter servers, achieving near-linear scaling with multiple GPUs/CPUs. Whether you're working on a small-scale project or a large-scale deep learning application, MXNet provides the tools and support you need to succeed. ### 7.2.3 Yolov5 Model * **Yolo Model Series Introduction** (1) YOLO Series YOLO (You Only Look Once) is an one-stage regression algorithm based on deep learning. R-CNN series algorithm dominates target detection domain before YOLOv1 is released. It has higher detection accuracy, but cannot achieve real-time detection due to its limited detection speed engendered by its two-stage network structure. To tackle this problem, YOLO is released. Its core idea is to redefine target detection as a regression problem, use the entire image as network input, and directly return position and category of Bounding Box at output layer. Compared with traditional methods for target detection, it distinguishes itself in high detection speed and high average accuracy. (2) YOLOv5 YOLOv5 is an optimized version based on previous YOLO models, whose detection speed and accuracy is greatly improved. In general, a target detection algorithm is divided into 4 modules, namely input end, reference network, Neck network and Head output end. The following analysis of improvements in YOLOv5 rests on these four modules. ① Input end: YOLOv5 employs Mosaic data enhancement method to increase model training speed and network accuracy at the stage of model training. Meanwhile, adaptive anchor box calculation and adaptive image scaling methods are proposed. ② Reference network: Focus structure and CPS structure are introduced in YOLOv5. ③ Neck network: same as YOLOv4, Neck network of YOLOv5 adopts FPN+PAN structure, but they differ in implementation details. ④ Head output layer: YOLOv5 inherits anchor box mechanism of output layer from YOLOv4. The main improvement is that loss function GIOU_Loss, and DIOU_nms for prediction box screening are adopted. * **YOLOv5 Model Structure** (1) Component ① Convolution layer: extract features of the image Convolution refers to the effect of a phenomenon, action or process that occurs repeatedly over time, impacting the current state of things. Convolution can be divided into two components: "**volume**" and "**accumulation**". "**Volume**" involves data flipping, while "**accumulation**" refers to the accumulation of the influence of past data on current data. Flipping the data helps to establish the relationships between data points, providing a reference for calculating the influence of past data on the current data.

In YOLOv5, the data being processed is typically an image, which is two-dimensional in computer vision. Therefore, the convolution applied is also a two-dimensional convolution, with the aim of extracting features from the image. The convolution kernel is an unit area used for each calculation, typically in pixels. The kernel slides over the image, with the size of the kernel being manually set. During convolution, the periphery of the image may remain unchanged or be expanded as needed, and the convolution result is then placed back into the corresponding position in the image. For instance, if an image has a resolution of 6×6, it may be first expanded to a 7×7 image, and then substituted into the convolution kernel for calculation. The resulting data is then refilled into a blank image with a resolution of 6×6.

② Pooling layer: enlarge the features of image The pooling layer is an essential part of a convolutional neural network and is commonly used for downsampling image features. It is typically used in combination with the convolutional layer. The purpose of the pooling layer is to reduce the spatial dimension of the feature map and extract the most important features. There are different types of pooling techniques available, including global pooling, average pooling, maximum pooling, and more. Each technique has its unique effect on the features extracted from the image.

Maximum pooling can extract the most distinctive features from an image, while discarding the remaining ones. For example, if we take an image with a resolution of 6×6 pixels, we can use a 2×2 filter to downsample the image and obtain a new image with reduced dimensions.

③ Upsampling layer: restore the size of an image This process is sometimes referred to as "**anti-pooling**". While upsampling restores the size of the image, it does not fully recover the features that were lost during pooling. Instead, it tries to interpolate the missing information based on the available information. For example, let's consider an image with a resolution of 6×6 pixels. Before upsampling, use 3X3 filter to calculate the original image so as to get the new image.

④ Batch normalization layer: organize data It aims to reduce the computational complexity of the model and to ensure that the data is better mapped to the activation function. Batch normalization works by standardizing the data within each mini-batch, which reduces the loss of information during the calculation process. By retaining more features in each calculation, batch normalization can improve the sensitivity of the model to the data.

⑤ RELU layer: activate function The activation function is a crucial component in the process of building a neural network, as it helps to increase the nonlinearity of the model. Without an activation function, each layer of the network would be equivalent to a matrix multiplication, and the output of each layer would be a linear function of the input from the layer above. This would result in a neural network that is unable to learn complex relationships between the input and output. There are many different types of activation functions. Some of the most common activation functions include the ReLU, Tanh, and Sigmoid. For example, ReLU is a piecewise function that replaces all values less than zero with zero, while leaving positive values unchanged. 7-20091G45H0607

⑥ ADD layer: add tensor In a typical neural network, the features can be divided into two categories: salient features and inconspicuous features.

⑦ Concat layer: splice tensor It is used to splice together tensors of features, allowing for the combination of features that have been extracted in different ways. This can help to increase the richness and complexity of the feature set.

(2) Compound Element When building a model, using only the layers mentioned above to construct functions can lead to lengthy, disorganized, and poorly structured code. By assembling basic elements into various units and calling them accordingly, the efficiency of writing the model can be effectively improved. ① Convolutional unit: A convolutional unit consists of a convolutional layer, a batch normalization layer, and an activation function. The convolution is performed first, followed by batch normalization, and finally activated using an activation function.

② Focus module The Focus module for interleaved sampling and concatenation first divides the input image into multiple large regions and then concatenates the small images at the same position within each region to break down the input image into several smaller images. Finally, the images are preliminarily sampled using convolutional units. As shown in the figure below, taking an image with a resolution of 6×6 as an example, if we set a large region as 2×2, then the image can be divided into 9 large regions, each containing 4 small images. By concatenating the small images at position 1 in each large region, a 3×3 image can be obtained. The small images at other positions are similarly concatenated, and the original 6×6 image will be broken down into four 3×3 images.

③ Residual unit The function of the residual unit is to enable the model to learn small changes in the image. Its structure is relatively simple and is achieved by combining data from two paths. The first path uses two convolutional units to sample the image, while the second path does not use convolutional units for sampling but directly uses the original image. Finally, the data from the first path is added to the second path.

④ Composite Convolution Unit In YOLOv5, the composite convolution unit is characterized by the ability to customize the convolution unit according to requirements. The composite convolution unit is also realized by superimposing data obtained from two paths. The first path only has one convolutional layer for sampling, while the second path has 2x+1 convolutional units and one convolutional layer for sampling. After sampling and splicing, the data is organized through batch normalization and then activated by an activation function. Finally, a convolutional layer is used for sampling.'

⑤ Compound Residual Convolutional Unit The compound residual convolutional unit replaces the 2x convolutional layers in the compound convolutional unit with x residual units. In YOLOv5, the feature of the compound residual unit is mainly that the residual units can be customized according to the needs.

⑥ Composite Pooling Unit The output data of the convolutional unit is fed into three max pooling layers and an additional copy is kept without processing. Then, the data from the four paths are concatenated and input into a convolutional unit. Using the composite pooling unit to process the data can significantly enhance the features of the original data.

(3) Structure Composed of three parts, YOLOv5 can output three sizes of data. Data of each size is processed in different way. The below picture is the output structure of YOLOv5.

Below is the output structures of data of three sizes.

### 7.2.4 YOLOv5 Running Procedure In this section, we provide an explanation of the model workflow using the anchor boxes, prediction boxes, and prior boxes employed in YOLOv5. * **Prior Bounding Box** When an image is input into model, object detection area requires us to offer, while prior bounding box is that box used to mark the object detection area on image before detection.

* **Prediction Box** The prediction box is not required to set manually, which is the output result of the model. When the first batch of training data is input into model, the prediction box will be automatically generated with it. The position in which the object of same type appear more frequently are set as the center of the prediction box.

* **Anchor Box** After the prediction box is generated, deviation may occur in its size and position. At this time, the anchor box serves to calibrate the size and position of the prediction box. The generation position of anchor box is determined by prediction box. In order to influence the position of the next generation of the prediction box, the anchor box is generated at the relative center of the existing prediction box.

* **Realization Process** After the data is calibrated, a prior bounding box appears on image. Then, the image data is input to the model, the model generates a prediction box based on the position of the prior bounding box. Having generated the prediction box, an anchor box will appear automatically. Lastly, the weights from this training are updated into model. Each newly generated prediction will be influenced by the last generated anchor box. Repeating the operations above continuously, the deviation of the size and position of the prediction box will be gradually erased until it coincides with the priori box.

### 7.2.5 Image Collecting & Labeling Given that training the Yolov5 model necessitates a substantial volume of data, our initial step involves collecting and labeling the requisite data in preparation for subsequent model training. To begin, you can assemble the images that require collection. As an illustration, let's consider the collection of traffic signs. * **Image Collecting** (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command below and hit enter to terminate the app auto-start service. ```bash ~/.stop_ros.sh ``` (4) Run the following command to initiate the depth camera service. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new command-line terminal, enter the command, and press Enter to create a new directory for storing the dataset: ```bash mkdir -p ~/my_data ``` (6) Execute the following command to open the image collecting tool. ```bash cd ~/software/collect_picture && python3 main.py ```

The "**save number**" displayed in the upper left corner represents the image ID, indicating the order in which images are saved. The term "**existing**" denotes the total number of images already saved. (7) Change the storage path as pictured.

(8) After selecting the corresponding directory, click-on '**Choose**' button.

(9) Place the object you want to recognize within the camera's field of view. Click the '**Save (space)**' button or press the Spacebar to capture and save the current image from the camera.

After clicking '**Save (space)**' or pressing the Spacebar, a folder named JPEGImages will be automatically created in the **/home/ubuntu/my_data** directory to store the images. :::{Note} To enhance model reliability, capture target recognition content from various distances, rotation angles, and tilt angles. ::: (10) Once you have finished collecting the images, click-on '**Quit**' button to close the software.

(11) Finally, press **Ctrl+C** in all the open command line terminals to exit. This completes the image collection process. * **Image Labeling** Labeling images is crucial for feature datasets as it enables the trained model to understand the categories of significant elements within the image. This understanding empowers the model to recognize these categories in new, previously unseen images. :::{Note} the input command should be case sensitive, and keywords can be complemented using Tab key. ::: (1) Open the command-line terminal, and execute the below command to open the image labeling software. ```bash python3 ./software/labelImg/labelImg.py ``` The table below outlines the function of each icon: | **Icon** | **Shortcut Key** | **Function** | | :----------------------------------------------------------: | :--------------: | :-------------------------------------------------------: | |

| Ctrl+U | Select the directory where the picture is saved. | |

| Ctrl+R | Select the directory where the calibration data is saved. | |

| W | Create annotation box | |

| Ctrl+S | Save annotation | |

| A | Switch to the previous image | |

| D | Switch to the next image | (2) Click

to open the directory where the images are stored. Select the location as shown in the diagram below:

(3) After clicking the "**Open**"

button, the interface shown in the next diagram will appear.

(4) Click the "**Change Save Dir**" button

and choose the "**Annotations**" directory within the same path as the image collection for storing calibration data.

(5) Click the **"Open"** button

to return to the image annotation interface. (6) Press the "**W**" key to begin creating annotation boxes. (7) Move the mouse to the desired location, hold down the left mouse button, and drag to create a box that covers the entire target area. Release the mouse button to finalize the selection.

(8) In the pop-up window, enter a name for the category of the target object (e.g., "**left**"). Click "**OK**" or press the "**Enter**" key to save the category.

(9) Press **Ctrl+S** to save the annotation data for the current image. (10) Then press the 'd' key to move to the next image. Repeat steps 7) to 9) to annotate all images. When finished, click the close button at the top right of the software to exit

. (11) Open a new command-line terminal, and run the following command to access the annotation file. ```bash cd my_data/Annotations && ls ```

### 7.2.6 Data Format Conversion * **Preparation** Before proceeding with this section, ensure that image collection and annotation have been completed. For detailed operational steps, please refer to the documents located in the "[**7.2.5 Image Collecting and Labeling**](#anchor_7_2_5)" directory. Prior to feeding the data into the YOLOv5 model for training, it is essential to assign categories to the images and convert the annotated data into the required format. * **Format Conversion** Before starting this section, you need to complete the image collection and annotation process. :::{Note} When entering commands, be sure to strictly differentiate between uppercase and lowercase letters. You can use the 'Tab' key to auto-complete keywords. ::: (1) Open a new terminal and enter the command to view the file. Tip: If the file does not exist, you can create it with the following command: ```bash vim ~/my_data/classes.names ``` (2) Press the "**i**" key to enter insert mode, and then type the class name **"left"** (if you have multiple classes, enter each on a new line). (3) When finished, press the **"ESC"** key, type `:wq`, and press Enter to save and exit. :::{Note} The class names added here must match those used in the image annotation software "**labelImg**". ::: (4) Next, return to the terminal and enter the following command to convert the data format, then press Enter: ```bash python3 ~/software/xml2yolo.py --data ~/my_data --yaml ~/my_data/data.yaml ```

The paths for `~/software/xml2yolo.py` and `~/my_data` should be adjusted based on the actual file locations! In the above command, three parameters are involved: `xml2yolo.py`: Converts the labeled data into a format compatible with YOLOv5 for model training. Ensure the path is correct. `my_data`: Refers to the folder containing your labeled data. Ensure the path is correct. `data.yaml`: Specifies the format for the entire folder after conversion. The command indicates that the output will be saved in the `my_data` folder. Below is an example of the generated `data.yaml` file:

The text following "**names**" specifies the types of labels, while "**nc**" denotes the number of label categories. "**train**" refers to the training set (used for training in deep learning), with the subsequent parameter indicating the path. "**val**" represents the validation set (used to validate the results during training), and its parameter also indicates the path. Make sure to adjust these paths based on their actual locations. If you later move the dataset from the robot to a local PC or cloud server to speed up the training process, you'll need to update the paths for "**train**" and "**val**" accordingly. Finally, an XML file will be generated in the ~/my_data folder, which identifies the paths of the split dataset. You can modify the save location by changing the last parameter in step 4's command, "**~/my_data/data.yaml**". Remember this file's path, as it will be needed for model training later on. ### 7.2.7 Model Training :::{Note} the input command should be case sensitive, and the keywords can be complemented using Tab key. ::: * **Preparation** After converting the model format, the next step is to proceed with model training. However, before training, ensure that the dataset has been prepared and converted into the required format. For detailed instructions, please refer to "[**7.2.6 Data Format Conversion**](#anchor_7_2_6)". * **Model Training** (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command below and hit Enter key to navigate to the designated directory. ```bash cd ~/third_party_ros2/yolov5/ ``` (4) Run the following command to train model. ```bash python3 train.py --img 640 --batch 64 --epochs 300 --data ~/my_data/data.yaml --weights yolov5n.pt ``` In the command: **--img** specifies the image size. **--batch** denotes the number of images processed in each batch. **--epochs** indicates the number of training iterations. **--data** is the path to the dataset. **--weights** is the path to the training files. You can adjust these parameters according to your needs. Increasing the number of training epochs can improve model accuracy, but it will also lengthen the training time. If you see the output shown in the image below, it indicates that the training process is in progress.

Upon completion of the model training, the path to the generated file will be displayed in the terminal and recorded. This path will be utilized in the subsequent course.

:::{Note} The path to the generated file may vary and can be located in the corresponding folder within runs/train. ::: ### 7.2.8 Traffic Sign Model Training For product names and reference paths mentioned in the document, please rely on the actual products and paths. When working with large datasets, it's not recommended to use the robot's onboard computer for training, as the limited I/O speed and memory can significantly slow down the process. Instead, it's better to use a computer with a dedicated graphics card. The training process is the same; you just need to set up the appropriate software environment. If the traffic sign recognition in autonomous driving mode isn't performing well, you may need to train your own model. This section provides guidance on how to train a traffic sign model. In the following content, you may see screenshots showing different robot host names (the environment configuration is generally similar across different robots). Simply follow the instructions as provided in the document; the differences in host names won't affect the execution. * **Preparation** ① Prepare a laptop, or a PC with wireless network card and mouse. ② Access the robot system desktop using VNC. * **Training Instructions** (1) Image Collecting (2) Start the robot, and access the robot system desktop using VNC. (3) Click-on

to open the command-line terminal. (4) Execute the below command to terminate the app service. ```bash ~/.stop_ros.sh ``` (5) Run the command to initiate the camera service. ```bash ros2 launch peripherals depth_camera.launch.py ``` (6) Open a new terminal window. Navigate to the directory where the collection tool is stored, then enter the following command to launch the image collection tool. ```bash cd software/collect_picture && python3 main.py ```

**save number** represents the picture ID, indicating which picture has been saved. "**existing**" denotes the total number of saved images. (7) Change the storage path to '**/home/ubuntu/my_data**'.

(8) Position the target recognition content within the camera's field of view, then click the '**Save (space)**' button or press the space bar to capture and save the current camera image. Upon saving, both the 'save number' and 'existing' counts will increment by 1. These two parameters allow for monitoring of the saved picture names displayed on the current camera screen and the total number of pictures stored in the folder.

Upon selecting the "**Save (space)**" option, a JPEGImages folder will automatically be created within the directory "**/home/ubuntu/my_data**" to store the images.

:::{Notice} * For enhanced model reliability, capture target recognition content from various distances, rotation angles, and tilt angles. * To guarantee recognition stability, it's advisable to increase the number of training images. It is recommended that each image category comprises at least 200 images for effective model training. ::: (9) After image collecting, click-on '**Quit**' to close this software.

* **Image Labeling** :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command to terminate the app service. ```bash ~/.stop_ros.sh ``` (4) Create a new terminal, and execute the following command. ```bash python3 software/labelImg/labelImg.py ``` (5) Upon launching the image annotation tool, the table below outlines the key functions | **Icon** | **Shortcut Key** | **Function** | | :----------------------------------------------------------: | :--------------: | :-------------------------------------------------------: | |

| Ctrl+U | Select the directory where the picture is saved. | |

| Ctrl+R | Select the directory where the calibration data is saved. | |

| W | Create annotation box | |

| Ctrl+S | Save annotation | |

| A | Switch to the previous image | |

| D | Switch to the next image | (6) Press "**Ctrl+U**" to designate the image storage directory as "**/home/ubuntu/my_data/JPEGImages/**", then click the "**Open**" button.

(7) Press "**Ctrl+R**" to specify the calibration data storage directory as "**/home/ubuntu/my_data/Annotations/**". The "**Annotations**" folder is automatically created during image collection. Finally, click the "**Open**" button.

(8) Press the "**W**" key to initiate label box creation. Position the mouse cursor appropriately, then press and hold the left mouse button while dragging to encompass the entire target recognition content within the label box. Release the left mouse button to finalize the selection of the target recognition content.

(9) In the pop-up window, assign a name to the category of the target recognition content, such as "**right**". Once the naming is complete, either click the "**OK**" button or press the "**Enter**" key to save this category.

(10) Use short-cut '**Ctrl+S**' to save the labeled data of the current pictures. (11) Label the remaining pictures in the same manner as step 9). (12) Click

in the system status bar to open the file manager. Navigate to the directory "**/home/ubuntu/my_data/Annotations/**" (where the image is saved in 9.2.2 Data Acquisition) to view the picture annotation file. * **Generate Related Files** (1) Click-on

to open the command-line terminal. (2) Execute the command, and hit Enter key. ```bash vim ~/my_data/classes.names ``` (3) Press the **"i"** key to enter the editing mode and input the class name of the target recognition content. If multiple class names are required, enter them one per line.

:::{Note} The class name entered here must match the naming convention used in the image annotation software "**labelImg**" when applicable. ::: (4) Having finish input, press '**Esc**' key, and input '**:wq**' to save the change and close the file.

(5) Execute the command to convert the data format by entering the command: ```bash python3 software/xml2yolo.py --data ~/my_date --yaml ~/my_data/data.yaml ``` The xml2yolo.py script converts annotated files into XML format, organizes the dataset into training and validation sets. If you encounter the following prompt, the conversion is successful.

The output paths may vary on different robots based on their actual storage locations. However, the generated data.yaml file corresponds to the calibrated dataset. * **Model Training** (1) Click-on

to open the command-line terminal. (2) Execute the command, and hit Enter to navigate to the designated directory. ```bash cd ~/third_party_ros2/yolov5/ ``` (3) Run the command, and hit Enter to train the model. ```bash python3 train.py --img 640 --batch 8 --epochs 300 --data ~/my_data/data.yaml --weights yolov5n.pt ``` In the command, "**--img**" specifies the image size, "**--batch**" determines the number of images processed in each iteration, and "**--epochs**" denotes the number of training iterations, which impacts the quality of the final model. For quick testing, we've set the number of training iterations to 8, but for optimal results, this value should be adjusted based on your specific requirements and the capabilities of your computer system. Furthermore, "**--data**" indicates the path to the calibrated dataset, while "**--weights**" refers to the path of the pretrained model. It's crucial to remember whether you're using "**yolov5n.pt**" or "**yolov5s.pt**" as your pretrained model. Users can customize these parameters according to their specific needs. If model reliability needs to be improved, consider increasing the number of training iterations, keeping in mind that this will also extend the training time. (4) When the option displayed in the figure below appears, input "**3**" and press Enter.

If the content displayed in the image below appears, it indicates that the training is ongoing.

After the model has been trained successfully, the terminal will print the path containing the file, and you need to record the path which will be required for '**Generate TensorRt Model Engine**'.

:::{Note} If multiple trainings are conducted, the naming of the "**exp5**" folder mentioned here will vary. For instance, it might be changed to "**exp2**", "**exp3**", and so on. The following steps will be associated with the folder naming used in this step. ::: * **Model Usage** (1) Enter the following command in the terminal and press Enter to stop the APP service: ```bash ~/.stop_ros.sh ``` (2) Use the following command to navigate to the training results folder: ```bash cd third_party_ros2/yolov5/runs/train/exp5/weights ``` (3) Use the following command to copy the best model weights to the configuration folder: ```bash cp -r best.pt ~/ros2_ws/src/yolov5_ros2/config ``` (4) Open a new terminal and use this command to start the camera node: ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new terminal and use this command to start the camera node: ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="best" ``` ### 7.2.9 Waste Card Model Training For product names and reference paths mentioned in the document, please use the actual products and paths. When handling large datasets, it is not advisable to use the robot's controller for training due to slower speeds from limited I/O and memory. Instead, use a computer with a dedicated graphics card. The training process is the same; just ensure the proper software environment is set up. If the object classification performance is not satisfactory and you need to train your own model, refer to this section for instructions on training a physical model. * **Preparation** ① Prepare a laptop or a PC equipped with a wireless network card and a mouse. ② Install and open the remote connection tool 'VNC'. * **Training Instructions** (1) Create Data Set :::{Note} Photo materials can be obtained from the internet. To minimize the performance requirements for annotation and training, resize the images to 640x480 pixels. Images captured with the 'Capture' tool are set to this resolution by default. ::: For this example, we use selfie images for training. If you wish to use images from the internet, please refer to the 'Mask Image Training' guide for detailed instructions. (2) Image Collection ① Start the robot, and access the robot system desktop using VNC. ② Click-on

to open the command-line terminal. ③ Execute the command to disable app service. ```bash ~/.stop_ros.sh ``` ④ Run the below command to enable the camera service. ```bash ros2 launch peripherals depth_camera.launch.py ``` ⑤ Open a new terminal. Navigate to the directory where the collection tool is stored and enter the below command to launch the image collection tool. ```bash cd software/collect_picture && python3 main.py ``` ⑥ Position the card that requires training within the camera's field of view, and click **"Save"** to capture the current image. Ensure that the number of images for each type is consistent. For instance, if you capture 50 images of the banana peel card, you should also capture 50 images of the toothbrush card.

Upon clicking the "**Save (space)**" button, a folder named JPEGImages will be automatically created to store the images in the directory "**/home/ubuntu/my_data**".

:::{Note} * For enhanced model reliability, it's important to capture the target recognition content from various distances, rotation angles, and tilt angles. * To ensure stable recognition, increase the number of images used for model training. It's recommended to have at least 200 pictures for each type during training. ::: ⑦ After image collecting, click-on 'Quit' button to close this software.

(3) Label Pictures :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: ① Start the robot, and access the robot system using the remote control software VNC. ② Click-on

to open the command-line terminal. ③ Execute the command to disable the app service. ```bash ~/.stop_ros.sh ``` ④ Run the command to launch the image labeling tool. ```bash python3 ./software/labelImg/labelImg.py ``` ⑤ Upon launching the image annotation tool, the table below outlines the key functions. | **Icon** | **Shortcut Key** | **Function** | | :----------------------------------------------------------: | :--------------: | :-------------------------------------------------------: | |

| Ctrl+U | Select the directory where the picture is saved. | |

| Ctrl+R | Select the directory where the calibration data is saved. | |

| W | Create annotation box | |

| Ctrl+S | Save annotation | |

| A | Switch to the previous image | |

| D | Switch to the next image | Press "**Ctrl+U**" to select the image storage directory as "**/home/ubuntu/my_data/JPEGImages/**", then click the "**Open**" button.

⑥ Begin by clicking "**Open Dir**" to access the folder where the pictures are stored. Next, select "**/home/ubuntu/My_Data/JPEGImage**", then click "**Open**" to proceed.

⑦ Click-on '**Create RectBox**' button to create a annotation box.

⑧ Position the mouse cursor appropriately, then press and hold the left mouse button while dragging out a rectangular frame to select the training content in the photo. Let's use bananas as an example.

⑨ After releasing the mouse, enter the category name of this card in the pop-up dialog box, then click "**OK**". For instance, you can enter "**apple**" for apples and "**potato**" for potatoes. (Items of the same type must have the same name)

⑩ After labeling a picture, click-on '**Save**' button, then click-on '**Next Image**' to proceed with the image labeling.

:::{Note} * While annotating, you can use shortcut keys to expedite the process. For instance, press "**D**" to switch to the next picture and press "**W**" to create a label box. * You can also use **"Ctrl+V"** to paste the annotation box from the previous picture here. However, note that this method is only applicable to annotations of the same type of image. This is because when pasting the annotation box, it also pastes the name information from the previous picture. ::: (4) Generate Related Files ① Click-on

to open the command-line terminal. ② Execute the command, and hit Enter. ```bash vim ~/my_data/classes.names ``` ③ Press the "**i**" key to enter editing mode and include the class name for the target recognition content. When adding multiple class names, each class name should be on a separate line. If you need to enter multiple classes, press enter for a new line and then specify each class accordingly based on the different types.

:::{Note} The class name added here must match the naming convention used in the image annotation software "**labelImg**". ::: ④ Having finished the input, press '**Esc**' key and type '**:wq**' to save the change and close the file.

⑤ Enter the command to convert the data format and press Enter. ```bash python3 software/xml2yolo.py --data ~/my_data --yaml ~/my_data/data.yaml ``` This output should reflect the actual storage location of the folder within the robot system. The output paths may vary between different robots, but the data.yaml file generated ultimately corresponds to the calibrated dataset. (5) Start Training ① Click-on

to open the command-line terminal. ② Run the command to navigate to the specific directory. ```bash cd ~/third_party_ros2/yolov5/ ``` ③ Enter the command and press Enter to train the model. ```bash python3 train.py --img 640 --batch 8 --epochs 300 --data ~/my_data/data.yaml --weights yolov5n.pt ``` In the command, "**--img**" denotes the image size; "**--batch**" indicates the number of individual image inputs processed together; "**--epochs**" represents the number of training iterations, reflecting the frequency of machine learning cycles. This value should be determined based on the final desired model performance. For expedited testing purposes, the number of training iterations is initially set to 8. However, on more capable computer host systems, increasing this value can lead to improved training outcomes. "**--data**" specifies the path to the dataset folder that has been calibrated. "**--weights**" denotes the path to the pre-trained model file, which serves as the starting point for your own training process. It's important to note whether you're using "**yolov5n.pt**" or "**yolov5s.pt**" as input parameters. Users have the flexibility to adjust these parameters according to their specific requirements. To enhance the model's reliability, users may consider increasing the number of training iterations, although it's essential to be aware that this will also extend the training duration accordingly. ④ When the terminal prints that Enter your choice, enter '3', and hit Enter.

If the content displayed in the image below appears, it indicates that the training process is currently underway.

Once the model training is completed, the path of the generated file will be printed on the terminal and recorded. This path will be utilized in the subsequent step of "**Generating TensorRT Model Engine**"

:::{Note} If multiple trainings are conducted, the naming of the **"exp5"** folder mentioned here will vary. For instance, it might be adjusted to **"exp2"**, **"exp3"**, and so forth. The subsequent steps will depend on the folder naming used in this step. ::: * **Model Usage** (1) Run the following command and hit Enter key to terminate the app service. ```bash ~/.stop_ros.sh ``` (2) Execute the below command to navigate to the folder storing the training result. ```bash cd third_party_ros2/yolov5/runs/train/exp5/weights ``` (3) Enter the command to navigate to the training results folder. ```bash cp -r best.pt ~/ros2_ws/src/yolov5_ros2/config ``` (4) Input the below command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new command-line terminal, and run the following command to initiate the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="best" ``` ### 7.2.10 Physical Model Training :::{Note} This section provides a general overview. Please refer to the actual product names and reference paths mentioned in the document. ::: Training large datasets on the robot's controller is not recommended due to its slow training speed caused by limitations in I/O port speed and memory. It is advised to utilize a computer equipped with a dedicated graphics card for faster training. The training process remains the same, and only the relevant program running environment needs to be configured. If the recognition accuracy of the physical sorting gameplay is inadequate and you require training your own model, you can refer to this section for guidance on physical model training. * **Preparation** (1) Prepare a laptop, or a PC with wireless network card and mouse. (2) Access the robot system desktop using VNC. * **Training Instructions** The training steps are akin to those followed for road sign model. (1) Image Collecting ① Start the robot, and access the robot system desktop using VNC. ② Click-on

to open the command-line terminal. ③ Execute the following command to terminate the app service. ```bash ~/.stop_ros.sh ``` ④ Run the command to initiate the camera service. ```bash ros2 launch peripherals depth_camera.launch.py ``` ⑤ Open a new terminal window. Navigate to the directory where the collection tool is stored, then enter the following command to launch the image collection tool. ```bash cd ~/software/collect_picture && python3 main.py ```

'"**save number**" represents the picture ID, indicating which picture has been saved. "**existing**" denotes the total number of saved images. ⑥ Change the storage path to '**/home/ubuntu/my_data**'.

⑦ Position the target recognition content within the camera's field of view, then click the '**Save (space)**' button or press the space bar to capture and save the current camera image. Upon saving, both the '**save number**' and 'existing' counts will increment by 1. These two parameters allow for monitoring of the saved picture names displayed on the current camera screen and the total number of pictures stored in the folder.

Upon selecting the "**Save (space)**" option, a JPEGImages folder will automatically be created within the directory "**/home/ubuntu/my_data**" to store the images. :::{Note} * For enhanced model reliability, capture target recognition content from various distances, rotation angles, and tilt angles. * To guarantee recognition stability, it's advisable to increase the number of training images. It is recommended that each image category comprises at least 200 images for effective model training. ::: ⑧ After image collecting, click-on '**Quit**' to close this software.

(2) Image Labeling :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: ① Start the robot, and access the robot system desktop using VNC. ② Click-on

to open the command-line terminal. ③ Execute the below command to terminate the app service. ```bash ~/.stop_ros.sh ``` ④ Execute the below command to enable the image labeling tool. ```bash python3 ./software/labelImg/labelImg.py ``` ⑤ Upon launching the image annotation tool, the table below outlines the key functions | **Icon** | **Shortcut Key** | **Function** | | :----------------------------------------------------------: | :--------------: | :-------------------------------------------------------: | |

| Ctrl+U | Select the directory where the picture is saved. | |

| Ctrl+R | Select the directory where the calibration data is saved. | |

| W | Create annotation box | |

| Ctrl+S | Save annotation | |

| A | Switch to the previous image | |

| D | Switch to the next image | Press "**Ctrl+U**" to designate the image storage directory as "**/home/ubuntu/my_data/JPEGImages/**", then click the "**Open**" button.

⑥ First, click "**Open Dir**" to access the folder where the pictures are stored. Then, select "**/home/ubuntu/My_Data/JPEGImage**" and click "**Open**" to proceed.

⑦ Click-on '**Create RectBox**' button to create a annotation box.

⑧ Position the mouse cursor appropriately, then hold down the left mouse button and drag out a rectangular frame to select the training content in the photo. For instance, let's use an apple as an example.

⑨ After releasing the mouse, enter the category name of this object in the pop-up dialog box, and then click "**OK**". For instance, if it's an apple, enter "**apple**"; if it's a potato, enter "**potato**". (Items of the same type must have the same name.)

⑩ Once you've finished annotating one image, click "**Save**" to save your annotations. Then, click "**Next Image**" to proceed to annotate the next image.

:::{Note} During annotation, you can utilize shortcut keys to expedite the process. For example, press **"D"** to switch to the next picture and press "**W**" to create a label box. You can also press "**Ctrl+v**" to paste the annotation box from the previous picture. However, please note that this method is only applicable to annotations of the same type of image. This is because when the annotation box is pasted, it will include the name information from the previous picture. ::: (3) Generate Related Files ① Click-on

to open the command-line terminal. ② Execute the command, and hit Enter. ```bash vim ~/my_data/classes.names ``` ③ Press the "**i**" key to enter editing mode and insert the class name of the target recognition content. If you need to add multiple class names, ensure each class name is placed on a separate line. :::{Note} The class name entered here must match the naming convention used in the image annotation software "**labelImg**". ::: ④ Having finished input, press '**Esc'** key, then input '**:wq**' to save the change and close the file. ⑤ Run the command to convert the data format. ```bash python3 software/xml2yolo.py --data ~/my_data --yaml ~/my_data/data.yaml ``` This output should be based on the actual storage location of the folder in the robot system. The output paths may vary between different robots, but the final data.yaml file generated corresponds to the calibrated dataset. (4) Start Training ① Click-on

If the content displayed in the image below appears, it indicates that the training process is currently underway.

:::{Note} If multiple trainings are conducted, the naming of the "**exp5**" folder mentioned here will vary. For instance, it might be adjusted to "**exp2**", "**exp3**", and so forth. The subsequent steps will depend on the folder naming used in this step. ::: * **Model Usage** (1) Enter the following command and press Enter to stop the APP service: ```bash ~/.stop_ros.sh ``` (2) Navigate to the training results folder with this command: ```bash cd third_party_ros2/yolov5/runs/train/exp5/weights ``` (3) Copy the best model weights to the configuration folder using this command: ```bash cp -r best.pt ~/ros2_ws/src/yolov5_ros2/config ``` (4) Start the camera node by entering this command: ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new terminal and launch the YOLOv5 node with the best model by entering this command: ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="best" ```

## 7.3 Map and Props Setup ### 7.3.1 Preface: This document pertains to the autonomous driving capabilities of our company's vehicle. Before engaging the autonomous driving feature, please verify that the hardware components (such as the depth camera and car chassis motors) installed in the vehicle meet the specified requirements and are operating correctly. Additionally, ensure that the battery is adequately charged. ### 7.3.2 Props Setup & Notices Prior to commencing autonomous driving, please follow the steps below to set up the map. :::{Note} Tools required for autonomous driving are available for separate purchase. If you are interested, kindly reach out to us at support@hiwonder.com. ::: * **Setup Instructions** (1) Map Setup To begin, ensure that the site is situated on a flat and spacious ground, and verify that there is ample lighting surrounding the area, as illustrated in the image below:

Place the robot at the start point as pictured.

As the robot advances, it will navigate along the yellow line bordering the map, and it can dynamically adjust its posture based on the lane lines in real-time. (2) Road Signs Setup In the game of autonomous driving, a total of 4 waypoints need to be placed: 2 for straight driving, 1 for right turn, and 1 for parking. Please refer to the following image for the specific placement positions: (Note: Ensure accurate placement!!)

Go straight: instruct the robot car to go forward. IMG_20231010_110334(1)

Turn right: Instruct the robot car to turn right. IMG_20231010_110620(1)

Park: Command the robot car to park. IMG_20231010_110354(1)

The purpose of the road signs is to guide the car during its journey. Once the car recognizes a road sign, it will perform the appropriate actions. Additionally, the car will slow down as it approaches a pedestrian crosswalk. (3) Traffic Light Setup In the game of the autonomous driving function, only one traffic light needs to be placed. The exact placement location is illustrated in the figure below: IMG_20231010_110556(1)

Traffic light props are utilized to replicate real-world traffic lights on roads. When the car detects the traffic light, it will adhere to the "**stop on red light, go on green light**" rule. Once all venue props are arranged, refer to the following picture for setting up the venue environment: 1696918732257

:::{Notice} * Ensure that the site has adequate lighting, preferably natural ambient light. Avoid strong direct light sources and colored lights, as they may hinder overall recognition. Pay attention to this aspect when setting up the site. * While laying out and using the equipment, ensure proper care and protection of the props on the site. In case of incomplete maps or signs, or if the traffic light is damaged and cannot function properly, promptly contact our after-sales personnel to purchase replacements to prevent any disruptions to the recognition process ::: ## 7.4 Autonomous Driving ### 7.4.1 Lane Keeping This lesson focuses on controlling the car to move forward and maintain lane alignment through instructions.

* **Preparation** (1) Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to "[**7.3 Map and Props Setup**](#anchor_7_3)" in the same directory as this section for guidance. (In this lesson, we are only experiencing the road driving-line patrol function, so there is no need to place props such as traffic lights and signboards.) (2) When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition. (3) It is essential to adjust the color threshold beforehand and set the color threshold of the yellow line to avoid misidentification during subsequent recognition. For specific color threshold adjustment, please refer to the "[**6. ROS2+OpenCV Course**](https://docs.hiwonder.com/projects/JetAutoPi/en/latest/docs/6.ROS2%2BOpenCV.html)" for reference. (4) It is recommended to position the car in the middle of the road for easy identification! * **Program Logic** Lane keeping involves three main steps: capturing real-time images, processing the images, and comparing the results. First, we use the camera to capture real-time images of the scene. Next, we process these images through color recognition, converting them to different color spaces, and applying techniques such as erosion, dilation, and binarization. In the result comparison phase, we define a Region of Interest (ROI), outline the contours of the processed image, and perform calculations to compare the results. Finally, based on these comparisons, the robot adjusts its direction to stay centered within the lane. You can find the source code for this program at: [/home/ubuntu/ros2_ws/src/example/example/self_driving/lane_detect.py](../_static/source_code/lane_detect.zip) * **Operation Steps** :::{Note} The input command should be case sensitive, and keywords can be complemented using Tab key. ::: Start the robot, and access the robot system desktop using VNC. * **Enable the model:** (1) Click-on

to start the command-line terminal. (2) Execute the command and hit Enter to disable the app auto-start service. ```bash ~/.stop_ros.sh ``` (3) Run the following command and hit Enter key. ```bash ros2 launch example self_driving.launch.py only_line_follow:=true ``` (4) To close the program, click on the terminal window where it's running and press '**Ctrl+C**'.

* **Program Outcome** After starting the game, place the robot on the road, and it will automatically detect the yellow line at the edge of the road. The robot will then adjust its position based on the detection results.

* **Program Analysis** The source code of this program is saved in [ros2_ws/src/example/example/self_driving/lane_detect.py](../_static/source_code/lane_detect.zip)

**Function:** image_callback： {lineno-start=164} ```python def image_callback(ros_image): global image rgb_image = np.ndarray(shape=(ros_image.height, ros_image.width, 3), dtype=np.uint8, buffer=ros_image.data) # 原始 RGB 画面 image = rgb_image ``` Image callback function is used to read the camera node. **Class:** **LaneDetector:** {lineno-start=13} ```python class LaneDetector(object): def __init__(self, color): # 车道线颜色 self.target_color = color # 车道线识别的区域 self.rois = ((450, 480, 0, 320, 0.7), (390, 420, 0, 320, 0.2), (330, 360, 0, 320, 0.1)) self.weight_sum = 1.0 def set_roi(self, roi): self.rois = roi ``` **Init:** {lineno-start=14} ```python def __init__(self, color): # 车道线颜色 self.target_color = color # 车道线识别的区域 self.rois = ((450, 480, 0, 320, 0.7), (390, 420, 0, 320, 0.2), (330, 360, 0, 320, 0.1)) self.weight_sum = 1.0 ``` Initialize the required parameters and set the ROI to lock the recognition range. **set_roi:** {lineno-start=21} ```python def set_roi(self, roi): self.rois = roi ``` Used to set the Region of Interest (ROI) for recognition. **get_area_max_contour:** {lineno-start=24} ```python @staticmethod def get_area_max_contour(contours, threshold=100): ''' 获取最大面积对应的轮廓 :param contours: :param threshold: :return: ''' contour_area = zip(contours, tuple(map(lambda c: math.fabs(cv2.contourArea(c)), contours))) contour_area = tuple(filter(lambda c_a: c_a[1] > threshold, contour_area)) if len(contour_area) > 0: max_c_a = max(contour_area, key=lambda c_a: c_a[1]) return max_c_a return None ``` Obtains the contour with the maximum area from the list of contours obtained through OpenCV. **add_horizontal_line:** {lineno-start=39} ```python def add_horizontal_line(self, image): # |____ ---> |———— ---> —— h, w = image.shape[:2] roi_w_min = int(w/2) roi_w_max = w roi_h_min = 0 roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] # 截取右半边 flip_binary = cv2.flip(roi, 0) # 上下翻转 max_y = cv2.minMaxLoc(flip_binary)[-1][1] # 提取最上，最左数值为255的点坐标 return h - max_y ``` Adds a horizontal recognition line based on the width and height of the frame and the ROI settings. **add_vertical_line_far:** {lineno-start=52} ```python def add_vertical_line_far(self, image): h, w = image.shape[:2] roi_w_min = int(w/8) roi_w_max = int(w/2) roi_h_min = 0 roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] flip_binary = cv2.flip(roi, -1) # 图像左右上下翻转 #cv2.imshow('1', flip_binary) # min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(ret) # minVal：最小值 # maxVal：最大值 # minLoc：最小值的位置 # maxLoc：最大值的位置 # 遍历的顺序，先行再列，行从左到右，列从上到下 (x_0, y_0) = cv2.minMaxLoc(flip_binary)[-1] # 提取最上，最左数值为255的点坐标 y_center = y_0 + 55 roi = flip_binary[y_center:, :] (x_1, y_1) = cv2.minMaxLoc(roi)[-1] down_p = (roi_w_max - x_1, roi_h_max - (y_1 + y_center)) y_center = y_0 + 65 roi = flip_binary[y_center:, :] (x_2, y_2) = cv2.minMaxLoc(roi)[-1] up_p = (roi_w_max - x_2, roi_h_max - (y_2 + y_center)) up_point = (0, 0) down_point = (0, 0) if up_p[1] - down_p[1] != 0 and up_p[0] - down_p[0] != 0: up_point = (int(-down_p[1]/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), 0) down_point = (int((h - down_p[1])/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), h) return up_point, down_point ``` Adds a recognition vertical line for the part of the frame farther from the robot based on the ROI settings. **get_binary** {lineno-start=114} ```python def get_binary(self, image): # 通过lab空间识别颜色 img_lab = cv2.cvtColor(image, cv2.COLOR_RGB2LAB) # rgb转lab img_blur = cv2.GaussianBlur(img_lab, (3, 3), 3) # 高斯模糊去噪 mask = cv2.inRange(img_blur, tuple(lab_data['lab']['Stereo'][self.target_color]['min']), tuple(lab_data['lab']['Stereo'][self.target_color]['max'])) # 二值化 eroded = cv2.erode(mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 腐蚀 dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 膨胀 return dilated ``` Performs color recognition based on the color space and processes the binarized image. **add_vertical_line_near:** {lineno-start=86} ```python def add_vertical_line_near(self, image): # ——| |—— | # | ---> | ---> h, w = image.shape[:2] roi_w_min = 0 roi_w_max = int(w/2) roi_h_min = int(h/2) roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] flip_binary = cv2.flip(roi, -1) # 图像左右上下翻转 #cv2.imshow('1', flip_binary) (x_0, y_0) = cv2.minMaxLoc(flip_binary)[-1] # 提取最上，最左数值为255的点坐标 down_p = (roi_w_max - x_0, roi_h_max - y_0) (x_1, y_1) = cv2.minMaxLoc(roi)[-1] y_center = int((roi_h_max - roi_h_min - y_1 + y_0)/2) roi = flip_binary[y_center:, :] (x, y) = cv2.minMaxLoc(roi)[-1] up_p = (roi_w_max - x, roi_h_max - (y + y_center)) up_point = (0, 0) down_point = (0, 0) if up_p[1] - down_p[1] != 0 and up_p[0] - down_p[0] != 0: up_point = (int(-down_p[1]/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), 0) down_point = down_p return up_point, down_point, y_center def get_binary(self, image): # 通过lab空间识别颜色 img_lab = cv2.cvtColor(image, cv2.COLOR_RGB2LAB) # rgb转lab img_blur = cv2.GaussianBlur(img_lab, (3, 3), 3) # 高斯模糊去噪 mask = cv2.inRange(img_blur, tuple(lab_data['lab']['Stereo'][self.target_color]['min']), tuple(lab_data['lab']['Stereo'][self.target_color]['max'])) # 二值化 eroded = cv2.erode(mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 腐蚀 dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 膨胀 return dilated ``` Adds a recognition vertical line for the part of the frame closer to the robot based on the ROI and frame width and height settings. **\_\_call\_\_:** {lineno-start=124} ```python def __call__(self, image, result_image): # 按比重提取线中心 centroid_sum = 0 h, w = image.shape[:2] max_center_x = -1 center_x = [] for roi in self.rois: blob = image[roi[0]:roi[1], roi[2]:roi[3]] # 截取roi contours = cv2.findContours(blob, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1)[-2] # 找轮廓 max_contour_area = self.get_area_max_contour(contours, 30) # 获取最大面积对应轮廓 if max_contour_area is not None: rect = cv2.minAreaRect(max_contour_area[0]) # 最小外接矩形 box = np.intp(cv2.boxPoints(rect)) # 四个角 for j in range(4): box[j, 1] = box[j, 1] + roi[0] cv2.drawContours(result_image, [box], -1, (255, 255, 0), 2) # 画出四个点组成的矩形 # 获取矩形对角点 pt1_x, pt1_y = box[0, 0], box[0, 1] pt3_x, pt3_y = box[2, 0], box[2, 1] # 线的中心点 line_center_x, line_center_y = (pt1_x + pt3_x) / 2, (pt1_y + pt3_y) / 2 cv2.circle(result_image, (int(line_center_x), int(line_center_y)), 5, (0, 0, 255), -1) # 画出中心点 center_x.append(line_center_x) else: center_x.append(-1) for i in range(len(center_x)): if center_x[i] != -1: if center_x[i] > max_center_x: max_center_x = center_x[i] centroid_sum += center_x[i] * self.rois[i][-1] if centroid_sum == 0: return result_image, None, max_center_x center_pos = centroid_sum / self.weight_sum # 按比重计算中心点 angle = math.degrees(-math.atan((center_pos - (w / 2.0)) / (h / 2.0))) return result_image, angle, max_center_x ``` Callback function for the entire class. Performs color recognition here, draws the recognized yellow lines using OpenCV, and then outputs the image, angle, and pixel coordinates X of each ROI recognition contour. ### 7.4.2 Road Sign Detection * **Preparation** ① Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to "**Map and Props Setup**" in the same directory as this section for guidance. ② The road sign model used in this section is trained with YOLOv5. For more details on YOLOv5, please refer to '**7 ROS+Machine Learning Course\2. Machine Learning**.' ③ When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition. * **Program Logic** Firstly, acquire the real-time image from the camera and apply operations such as erosion and dilation. Next, invoke the YOLOv5 model and compare it with the target screen image. Finally, execute the appropriate landmark action based on the comparison results. You can find the source code for this program at: [/home/ubuntu/ros2_ws/src/yolov5_ros2/yolov5_ros2/yolo_detect.py](../_static/source_code/yolo_detect.zip) * **Operation Steps** :::{Note} The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to "**Integrated Application**" within the same file. Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the "**Tab**" key to autocomplete keywords. ::: (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to start the command-line terminal. (3) Execute the command, and hit Enter to disable the app service. ```bash ~/.stop_ros.sh ``` (4) Run the command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Type the command to initiate the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="traffic_signs_640s_7_1" ``` (6) Place the road signs in front of the camera, and the system will recognize and display them on the screen. To stop this function, press "**Ctrl+C**" in the terminal interface. If it does not close successfully, please try again. (7) Open a new command-line terminal, and execute the following command to start the rqt tool to acquire the live camera feed. ```bash rqt ``` (8) Choose the topic as /result_img.

:::{Note} If the model fails to recognize traffic-related signs, you should reduce the confidence threshold. Conversely, if the model incorrectly identifies traffic-related signs, you should increase the confidence threshold. ::: (9) Enter the following command to navigate to the directory containing the game programs. ```bash cd /home/ubuntu/ros2_ws/src/example/example/self_driving ``` (10) Run the command below to initiate the game program. ```bash vim self_driving.launch.py ``` The value inside the red box represents the confidence level and can be adjusted to improve the target detection performance.

* **Program Outcome** After initiating the game, place the robot on the road within the map. Once the robot identifies landmarks, it will highlight the detected landmarks and annotate them based on the highest confidence level learned from the model.

### 7.4.3 Traffic Light Recognition * **Preparation** ① Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to "[**7.3 Map and Props Setup**](#anchor_7_3)" in the same directory as this section for guidance. ② The road sign model used in this section is trained with YOLOv5. For more details on YOLOv5, please refer to '[**7.2 Machine Learning**](#anchor_7_2)' ③ When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition. * **Program Logic** Firstly, capture a real-time image from the camera and apply operations such as erosion and dilation. Next, invoke the YOLOv5 model to compare it with the target screen image. Finally, execute corresponding landmark actions based on the comparison results. The source code for this program can be found at : [/home/ubuntu/ros2_ws/src/yolov5_ros2/yolov5_ros2/yolo_detect.py](../_static/source_code/yolo_detect.zip) * **Operation Steps** :::{Note} The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to "**Integrated Application**" within the same file. Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the "**Tab**" key to autocomplete keywords. ::: (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command, and hit Enter to disable the app service. ```bash ~/.stop_ros.sh ``` (4) Run the command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Type the command to initiate the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="traffic_signs_640s_7_1" ``` (6) Place the traffic sign in front of the camera, and it will be recognized and marked in the image. To close this mode, press "**Ctrl+C**" in the terminal. If it fails to close, please try again. :::{Note} If the model fails to recognize traffic signs, you should reduce the confidence threshold. If the model incorrectly identifies signs, you should increase the confidence threshold. ::: (7) Run the following command to navigate to the folder containing the game programs. ```bash cd /home/ubuntu/ros2_ws/src/example/example/self_driving ``` (8) Type the following command to access the game program. ```bash vim self_driving.launch.py ``` The value in the red box represents the confidence level, which can be adjusted to modify the object detection results.

* **Program Outcome** After initiating the game, position the robot on the road depicted on the map. Upon recognizing the traffic signal, the robot will assess the color of the signal light and identify frames corresponding to red and green signal lights accordingly.

### 7.4.4 Turning Decision Making * **Preparation** ① Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to "[**7.3 Map and Props Setup**](#anchor_7_3)" in the same directory as this section for guidance. ② The roadmap model discussed in this section is trained using YOLOv5. For further information on YOLOv5 and related content, please refer to "[**7.2 Machine Learning**](#anchor_7_2)". ③ When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition. * **Program Logic** Firstly, capture the real-time image from the camera and apply operations such as erosion and dilation. Next, invoke the YOLOv5 model to compare the obtained image with the target screen image. Finally, based on the comparison outcomes, recognize the steering sign and direct the car accordingly. The source code of this program is saved in [/home/ubuntu/ros2_ws/src/yolov5_ros2/yolov5_ros2/yolo_detect.py](../_static/source_code/yolo_detect.zip) * **Operation Steps** :::{Note} The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to "**Integrated Application**" within the same file. ::: Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the "**Tab**" key to autocomplete keywords. (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command, and hit Enter to disable the app service. ```bash ~/.stop_ros.sh ``` (4) Run the command to enable the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Open a new command-line terminal, and run the following command to initiate the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="traffic_signs_640s_7_1" ``` (6) Place the road signs in front of the camera, and they will be recognized and highlighted on the screen. To stop this function, press "**Ctrl+C**" in the terminal. If the function does not stop, please try again. :::{Note} In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable. ::: (7) Type the command to initiate the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="traffic_signs_640s_7_1" ``` (8) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press '**Ctrl+C**'. :::{Note} In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable. ::: (9) Enter the command to navigate to the program path: ```bash cd /home/ubuntu/ros2_ws/src/example/example/self_driving ``` (10) Run the following command to start the game program. ```bash vim self_driving.launch.py ``` * **Program Outcome** Once the game begins, position the robot onto the road within the map. As the robot approaches a turning road sign, it will adjust its direction in accordance with the instructions provided by the sign.

### 7.4.5 Autonomous Parking * **Preparation** ① Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to "[**7.3 Map and Props Setup**](#anchor_7_3)" in the same directory as this section for guidance. ② The roadmap model discussed in this section is trained using YOLOv5. For further information on YOLOv5 and related content, please refer to "[**7. ROS+Machine Learning Course\7.2 Machine Learning**](#anchor_7_2)". ③ When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition. * **Program Logic** Begin by capturing the real-time image from the camera and applying operations such as erosion and dilation. Next, invoke the YOLOv5 model to compare the obtained image with the target screen image. Finally, based on the comparison results, identify the parking road sign and autonomously guide the car to park in the designated parking space. The source code of this program is saved in [/home/ubuntu/ros2_ws/src/example/example/yolov5_detect/yolov5_trt.py]()

* **Operation Steps** :::{Note} * The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to "**Integrated Application**" within the same file. * Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the **"Tab"** key to autocomplete keywords. ::: (1) Start the robot, and access the robot system desktop using VNC. (2) Click-on

to open the command-line terminal. (3) Execute the command, and hit Enter to disable the app service. ```bash ~/.stop_ros.sh ``` (4) Run the command to initiate the camera node. ```bash ros2 launch peripherals depth_camera.launch.py ``` (5) Type the command to enable the camera node. ```bash ros2 launch yolov5_ros2 yolov5_ros2.launch.py model:="traffic_signs_640s_7_1" ``` (6) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press '**Ctrl+C**'. :::{Note} In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable. ::: (7) Run the command to navigate to the directory containing programs. ```bash cd /home/ubuntu/ros2_ws/src/example/example/self_driving ``` (8) Enter the command to access the game program. ```bash vim self_driving.launch.py ``` The red box represents the confidence value, which can be adjusted to modify the effectiveness of target detection.

* **Program Outcome** After initiating the game, position the robot on the road within the map. As the robot progresses towards the parking sign, it will automatically park in the designated parking space based on the instructions provided by the road sign.

* **Parameter Adjustment** If the robot stops upon recognizing the parking sign and the parking position is not optimal, adjustments to the parameters in the program source code can be made. (1) Click-on

to start the command-line terminal. (2) Execute the command to navigate to the directory containing game programs. ```bash cd ros2_ws/src/example/example/self_driving/ ``` (3) Run the command to access the source code. ```bash vim self_driving.py ``` (4) Press the "**i**" key to enter insert mode and locate the code within the red box. Adjusting the parameters within the red box allows you to control the starting position for the robot to initiate the parking operation. Decreasing the parameters will result in the robot stopping closer to the zebra crossing, while increasing them will cause the robot to stop further away. Once adjustments are made, press the "**ESC**" key, type "**:wq**", and press Enter to save and exit.

You can adjust the parking processing function to alter the parking position of the robot. Initially, the parking action sets the linear speed in the negative direction of the Y-axis (right of the robot) to 0.2 meters per second, with a forward movement time of (0.38/2) seconds. To position the robot in the ideal location on the left side of the parking space, modify the speed and time accordingly.

### 7.4.6 Integrated Application This lesson provides instructions for implementing comprehensive driverless game on the robot, covering lane keeping, road sign detection, traffic light recognition, steering decision-making, and self-parking. * **Preparation** (1) Map Setup To ensure accurate navigation, place the map on a flat, smooth surface, free of wrinkles and obstacles. Position all road signs and traffic lights at designated locations on the map, facing clockwise. The starting point and locations of road signs are indicated below: :::{Note} Tools required for autonomous driving are available for separate purchase. If you are interested, kindly reach out to us at support@hiwonder.com. :::

* **Color Threshold Adjustment** Due to variations in light sources, it's essential to adjust the color thresholds for 'black, white, red, green, blue, and yellow' based on the guidelines provided in the '[**6. ROS2+OpenCV Course**](https://docs.hiwonder.com/projects/JetAutoPi/en/latest/docs/6.ROS2%2BOpenCV.html)' prior to starting. If the robot encounter inaccurate recognition while moving forward, readjust the color threshold specifically in the map area where recognition fails. * **Program Logic**

Actions implemented so far include: (1) Following the yellow line in the outermost circle of the patrol map. (2) Slowing down and passing if a zebra crossing is detected. (3) Making a turn upon detection of a turn sign. (4) Parking the vehicle and entering the parking lot upon detection of a stop sign. (5) Halting when a red light is detected. (6) Slowing down when passing a detected street light. First, load the model file trained by YOLOv5 and the required library files, obtain real-time camera images, and perform operations such as erosion and dilation on the images. Next, identify the target color line segment in the image and gather information such as size and center point of the target image. Then, invoke the model through YOLOv5 and compare it with the target screen image. Finally, adjust the forward direction based on the offset comparison of the target image's center point to keep the robot in the middle of the road. Additionally, perform corresponding actions based on different recognized landmark information during map traversal. The source code for this program can be found at:[/home/ubuntu/ros2_ws/src/example/example/self_driving/self_driving.py](../_static/source_code/self_driving.zip) * **Operation Steps** :::{Note} The input command should be case sensitive, and keywords can be implemented using Tab key. ::: Start the robot, and access the robot system desktop using VNC. (1) Click-on

to start the command-line terminal. (2) Execute the command, and hit Enter to disable the app service. ```bash ~/.stop_ros.sh ``` (3) Run the following command and hit Enter key. ```bash ros2 launch example self_driving.launch.py ``` (4) To close the program, click on the terminal window where it's running and press '**Ctrl+C**'.

* **Program Outcome** (1) Lane Keeping Upon initiating the game, the car will track the line and identify the yellow line at the road's edge. It will execute forward and turning actions based on the straightness or curvature of the yellow line to maintain lane position. (2) Traffic Light Recognition When the car encounters a traffic light, it will halt if the light is red and proceed if it's green. Upon approaching a zebra crossing, the car will automatically decelerate and proceed cautiously. (3) Turn and Parking Signs Upon detecting traffic signs while moving forward, the car will respond accordingly. If it encounters a right turn sign, it will execute a right turn and continue forward. In the case of a parking sign, it will execute a parking maneuver. Following these rules, the robot will continuously progress forward within the map. * **Program Analysis** The source code of this program is saved in:[ros2_ws/src/example/example/self_driving/self_driving.py](../_static/source_code/self_driving.zip)

(1) Function: {lineno-start=360} ```python def main(): node = SelfDrivingNode('self_driving') executor = MultiThreadedExecutor() executor.add_node(node) executor.spin() node.destroy_node() ``` Initiate autonomous driving class. (2) Class: {lineno-start=30} ```python class SelfDrivingNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.is_running = True self.pid = pid.PID(0.01, 0.0, 0.0) self.param_init() ``` **init:** {lineno-start=33} ```python super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.is_running = True self.pid = pid.PID(0.01, 0.0, 0.0) self.param_init() self.image_queue = queue.Queue(maxsize=2) self.classes = ['go', 'right', 'park', 'red', 'green', 'crosswalk'] self.dispaly =False self.bridge = CvBridge() self.lock = threading.RLock() self.colors = common.Colors() self.image_sub = None signal.signal(signal.SIGINT, self.shutdown) self.machine_type = os.environ.get('MACHINE_TYPE') self.lane_detect = lane_detect.LaneDetector("yellow") self.mecanum_pub = self.create_publisher(Twist, '/controller/cmd_vel', 1) self.joints_pub = self.create_publisher(ServosPosition, '/servo_controller', 1) # 舵机控制 self.result_publisher = self.create_publisher(Image, '~/image_result', 1) self.create_service(Trigger, '~/enter', self.enter_srv_callback) # 进入玩法 self.create_service(Trigger, '~/exit', self.exit_srv_callback) # 退出玩法 self.create_service(SetBool, '~/set_running', self.set_running_srv_callback) # self.heart = Heart(self.name + '/heartbeat', 5, lambda _: self.exit_srv_callback(None)) timer_cb_group = ReentrantCallbackGroup() self.client = self.create_client(Trigger, '/controller_manager/init_finish') self.client.wait_for_service() self.client = self.create_client(Trigger, '/yolov5_ros2/init_finish') self.client.wait_for_service() self.start_yolov5_client = self.create_client(Trigger, '/yolov5/start', callback_group=timer_cb_group) self.start_yolov5_client.wait_for_service() self.stop_yolov5_client = self.create_client(Trigger, '/yolov5/stop', callback_group=timer_cb_group) self.stop_yolov5_client.wait_for_service() self.timer = self.create_timer(0.0, self.init_process, callback_group=timer_cb_group) ``` Initialize the required parameters, obtain the current robot category, set the line-following color to yellow, start the chassis control, servo control, and image reading. Set up three types of services: enter, exit, and start, and read the YOLOv5 node. **init_process:** {lineno-start=70} ```python def init_process(self): self.timer.cancel() self.mecanum_pub.publish(Twist()) if not self.get_parameter('only_line_follow').value: self.send_request(self.start_yolov5_client, Trigger.Request()) set_servo_position(self.joints_pub, 1, ((10, 500), (5, 500), (4, 250), (3, 0), (2, 750), (1, 500))) # 初始姿态 time.sleep(1) if self.get_parameter('start').value: self.dispaly = True self.enter_srv_callback(Trigger.Request(), Trigger.Response()) request = SetBool.Request() request.data = True self.set_running_srv_callback(request, SetBool.Response()) #self.park_action() threading.Thread(target=self.main, daemon=True).start() self.create_service(Trigger, '~/init_finish', self.get_node_state) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') ``` Initialize the current robotic arm and start the main function. **param_init:** {lineno-start=91} ```python def param_init(self): self.start = False self.enter = False self.have_turn_right = False self.detect_turn_right = False self.detect_far_lane = False self.park_x = -1 # 停车标识的x像素坐标 self.start_turn_time_stamp = 0 self.count_turn = 0 self.start_turn = False # 开始转弯 ``` Initialize parameters required for position recognition or usage. **get_node_state:** {lineno-start=128} ```python def get_node_state(self, request, response): response.success = True return response ``` Obtain the current state of the node. **send_request:** {lineno-start=132} ```python def send_request(self, client, msg): future = client.call_async(msg) while rclpy.ok(): if future.done() and future.result(): return future.result() ``` Used to publish service requests. **enter_srv_callback:** {lineno-start=138} ```python def enter_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "self driving enter") with self.lock: self.start = False if self.image_sub is not None: self.destroy_subscription(self.image_sub) if self.machine_type == 'JetAuto': self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) # 摄像头订阅(subscribe to the camera) else: self.image_sub = self.create_subscription(Image, '/usb_cam/image_raw', self.image_callback, 1) # 摄像头订阅(subscribe to the camera) self.create_subscription(ObjectsInfo, '/yolov5_ros2/object_detect', self.get_object_callback, 1) self.mecanum_pub.publish(Twist()) self.enter = True response.success = True response.message = "enter" return response ``` Service for entering autonomous driving gameplay, start reading images and YOLOv5 recognition content, initialize speed. **exit_srv_callback:** {lineno-start=155} ```python def exit_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "self driving exit") with self.lock: try: if self.image_sub is not None: self.image_sub.unregister() if self.object_sub is not None: self.object_sub.unregister() except Exception as e: self.get_logger().info('\033[1;32m%s\033[0m' % str(e)) self.mecanum_pub.publish(Twist()) self.param_init() response.success = True response.message = "exit" return response ``` Service for exiting autonomous driving gameplay, stop reading images and YOLOv5 recognition content, initialize speed, reset parameters. **set_running_srv_callback:** {lineno-start=171} ```python def set_running_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "set_running") with self.lock: self.start = request.data if not self.start: self.mecanum_pub.publish(Twist()) response.success = True response.message = "set_running" return response ``` Start autonomous driving game, set the start parameter to True. **Shutdown:** {lineno-start=181} ```python def shutdown(self, signum, frame): # ctrl+c关闭处理 self.is_running = False ``` Callback function after closing the program, used to stop the currently running program. **image_callback:** {lineno-start=184} ```python def image_callback(self, ros_image): # 目标检查回调 cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满，丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) ``` Image callback function, enqueues images and discards expired ones. **park_action:** {lineno-start=193} ```python # 泊车处理 def park_action(self): twist = Twist() twist.linear.y = -0.2 self.mecanum_pub.publish(twist) time.sleep(0.38/0.2) self.mecanum_pub.publish(Twist()) ``` Parking logic, runs three different parking strategies according to three different chassis types. **get_object_callback:** {lineno-start=332} ```python # 获取目标检测结果 def get_object_callback(self, msg): self.objects_info = msg.objects if self.objects_info == []: # 没有识别到时重置变量 self.traffic_signs_status = None self.crosswalk_distance = 0 else: min_distance = 0 for i in self.objects_info: class_name = i.class_name center = (int((i.box[0] + i.box[2])/2), int((i.box[1] + i.box[3])/2)) if class_name == 'crosswalk': if center[1] > min_distance: # 获取最近的人行道y轴像素坐标 min_distance = center[1] elif class_name == 'right': # 获取右转标识 self.count_right += 1 self.count_right_miss = 0 if self.count_right >= 2: # 检测到多次就将右转标志至真 self.turn_right = True self.count_right = 0 elif class_name == 'park': # 获取停车标识中心坐标 self.park_x = center[0] elif class_name == 'red' or class_name == 'green': # 获取红绿灯状态 self.traffic_signs_status = i self.crosswalk_distance = min_distance ``` Callback function for reading YOLOv5, obtains the categories currently recognized by YOLOv5. **Main:** {lineno-start=201} ```python def main(self): while self.is_running: time_start = time.time() try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.is_running: break else: continue result_image = image.copy() if self.start: h, w = image.shape[:2] # 获取车道线的二值化图 binary_image = self.lane_detect.get_binary(image) # 检测到斑马线,开启减速标志 if 450 < self.crosswalk_distance and not self.start_slow_down: # 只有足够近时才开始减速 self.count_crosswalk += 1 if self.count_crosswalk == 3: # 多次判断，防止误检测 self.count_crosswalk = 0 self.start_slow_down = True # 减速标识 self.count_slow_down = time.time() # 减速固定时间 else: # 需要连续检测，否则重置 self.count_crosswalk = 0 twist = Twist() # 减速行驶处理 if self.start_slow_down: if self.traffic_signs_status is not None: # 通过面积判断离灯灯远近，如果太近那么即使是红灯也不会停 area = abs(self.traffic_signs_status.box[0] - self.traffic_signs_status.box[2])*abs(self.traffic_signs_status.box[1] - self.traffic_signs_status.box[3]) if self.traffic_signs_status.class_name == 'red' and area < 1000: # 如果遇到红灯就停车 self.mecanum_pub.publish(Twist()) self.stop = True elif self.traffic_signs_status.class_name == 'green': # 遇到绿灯，速度放缓 twist.linear.x = self.slow_down_speed self.stop = False if not self.stop: # 其他非停止的情况速度放缓，同时计时，时间=斑马线的长度/行驶速度 twist.linear.x = self.slow_down_speed if time.time() - self.count_slow_down > self.crosswalk_length/twist.linear.x: self.start_slow_down = False else: twist.linear.x = self.normal_speed # 直走正常速度 # 检测到停车标识+斑马线就减速, 让识别稳定 if 0 < self.park_x and 180 < self.crosswalk_distance: twist.linear.x = self.slow_down_speed if not self.start_park and 340 < self.crosswalk_distance: # 离斑马线足够近时就开启停车 self.mecanum_pub.publish(Twist()) self.start_park = True self.stop = True threading.Thread(target=self.park_action).start() # 右转及停车补线策略 if self.turn_right: y = self.lane_detect.add_horizontal_line(binary_image) if 0 < y < 400: roi = [(0, y), (w, y), (w, 0), (0, 0)] cv2.fillPoly(binary_image, [np.array(roi)], [0, 0, 0]) # 将上面填充为黑色，防干扰 min_x = cv2.minMaxLoc(binary_image)[-1][0] cv2.line(binary_image, (min_x, y), (w, y), (255, 255, 255), 40) # 画虚拟线来驱使转弯 elif 0 < self.park_x and not self.start_turn: # 检测到停车标识需要填补线，使其保持直走 if not self.detect_far_lane: up, down, center = self.lane_detect.add_vertical_line_near(binary_image) binary_image[:, :] = 0 # 全置黑，防止干扰 if 50 < center < 80: # 当将要看不到车道线时切换到识别较远车道线 self.detect_far_lane = True else: up, down = self.lane_detect.add_vertical_line_far(binary_image) binary_image[:, :] = 0 if up != down: cv2.line(binary_image, up, down, (255, 255, 255), 20) # 手动画车道线 result_image, lane_angle, lane_x = self.lane_detect(binary_image, image.copy()) # 在处理后的图上提取车道线中心 # 巡线处理 if lane_x >= 0 and not self.stop: if lane_x > 150: # 转弯 if self.turn_right: # 如果是检测到右转标识的转弯 self.count_right_miss += 1 if self.count_right_miss >= 15: self.count_right_miss = 0 self.turn_right = False self.count_turn += 1 if self.count_turn > 5 and not self.start_turn: # 稳定转弯 self.start_turn = True self.count_turn = 0 self.start_turn_time_stamp = time.time() twist.angular.z = -0.45 # 转弯速度 else: # 直道由pid计算转弯修正 self.count_turn = 0 if time.time() - self.start_turn_time_stamp > 3 and self.start_turn: self.start_turn = False if not self.start_turn: self.pid.SetPoint = 100 # 在车道中间时线的坐标 self.pid.update(lane_x) twist.angular.z = common.set_range(self.pid.output, -0.8, 0.8) self.mecanum_pub.publish(twist) else: self.pid.clear() # 绘制识别的物体，由于物体检测的速度小于线检测的速度，所以绘制的框会有所偏离 if self.objects_info != []: for i in self.objects_info: box = i.box class_name = i.class_name cls_conf = i.score cls_id = self.classes.index(class_name) color = colors(cls_id, True) plot_one_box( box, result_image, color=color, label="{}:{:.2f}".format(class_name, cls_conf), ) else: time.sleep(0.01) bgr_image = cv2.cvtColor(result_image, cv2.COLOR_RGB2BGR) if self.dispaly: cv2.imshow('result', bgr_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出 self.is_running = False self.result_publisher.publish(self.bridge.cv2_to_imgmsg(bgr_image, "bgr8")) time_d = 0.03 - (time.time() - time_start) if time_d > 0: time.sleep(time_d) self.mecanum_pub.publish(Twist()) rclpy.shutdown() # 获取目标检测结果 def get_object_callback(self, msg): self.objects_info = msg.objects if self.objects_info == []: # 没有识别到时重置变量 self.traffic_signs_status = None self.crosswalk_distance = 0 else: min_distance = 0 for i in self.objects_info: class_name = i.class_name center = (int((i.box[0] + i.box[2])/2), int((i.box[1] + i.box[3])/2)) if class_name == 'crosswalk': if center[1] > min_distance: # 获取最近的人行道y轴像素坐标 min_distance = center[1] elif class_name == 'right': # 获取右转标识 self.count_right += 1 self.count_right_miss = 0 if self.count_right >= 2: # 检测到多次就将右转标志至真 self.turn_right = True self.count_right = 0 elif class_name == 'park': # 获取停车标识中心坐标 self.park_x = center[0] elif class_name == 'red' or class_name == 'green': # 获取红绿灯状态 self.traffic_signs_status = i self.crosswalk_distance = min_distance ``` The main function within the class, runs different line-following strategies according to different chassis types. ### 7.4.7 FAQ (1) The robot exhibits inconsistent performance during line patrolling, often veering off course. Adjust the color threshold to better suit the lighting conditions of the actual scene. For precise instructions on color threshold adjustment, please consult "[**6. ROS+OpenCV Lesson**](https://docs.hiwonder.com/projects/JetAutoPi/en/latest/docs/6.ROS2%2BOpenCV.html)" for detailed guidance. (2) The robot's turning radius appears to be either too large or too small. (3) Ensure correct adjustment of the robot arm deviation. For detailed instructions on robot arm deviation adjustment, please refer to "**[8. ROS2-Robot Arm Control Course](https://docs.hiwonder.com/projects/JetAutoPi/en/latest/docs/8.robot_arm_control.html)**" for comprehensive learning. (4) Modify the line patrol processing code Navigate to the game program path by entering the command: ```bash cd ~/ros2_ws/src/example/example/self_driving/ ``` (5) Open the game program by entering the command: ```bash vim self_driving.py ``` (6) The red box denotes the lane's center point, which can be adjusted to fine-tune the turning effect. Decreasing the value will result in earlier turns, while increasing it will cause later turns.

(7) The parking location is suboptimal. You can adjust the parking processing function or modify the starting position of the parking operation. For detailed instructions, please consult "[**7.4.5 Autonomous Parking->Parameter Adjustment**](#anchor_7_4_5_5)" for reference and learning. (8) Inaccurate traffic sign recognition. Adjust the target detection confidence. For detailed instructions, please refer to "[**7.4.5 Autonomous Parking->Operation Steps**](#anchor_7_4_5_3) " for comprehensive learning.