6. ROS+Machine Learning Course
6.1 Machine Learning Introduction
6.1.1 What “Machine Learning” is
Machine Learning forms the cornerstone of artificial intelligence, serving as the fundamental approach to endow machines with intelligence. It spans multiple interdisciplinary fields such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory.
In essence, machine learning explores how computers can acquire new knowledge or skills by mimicking human learning behaviors and continuously enhancing their performance by reorganizing existing knowledge structures. Practically, it entails utilizing data to train models and leveraging these models for predictions.
For instance, consider AlphaGo, the pioneering artificial intelligence system that triumphed over human professional Go players and even world champions. AlphaGo operates on the principles of deep learning, wherein it discerns the intrinsic laws and representation layers within sample data to extract meaningful insights.
6.1.2 Types of Machine Learning
Machine learning can be broadly categorized into two types: supervised learning and unsupervised learning. The key distinction between these two types lies in whether the machine learning algorithm has prior knowledge of the classification and structure of the dataset.
Supervised Learning
Supervised learning involves providing a labeled dataset to the algorithm, where the correct answers are known. The machine learning algorithm uses this dataset to learn how to compute the correct answers. It is the most commonly used type of machine learning.
For instance, in image recognition, a large dataset of dog pictures can be provided, with each picture labeled as “dog”. This labeled dataset serves as the “correct answer”. By learning from this dataset, the machine can develop the ability to recognize dogs in new images.
Model Selection: In supervised learning, selecting the right model to represent the data relationship is crucial. Common supervised learning models encompass linear regression, logistic regression, decision trees, support vector machines (SVM), and deep neural networks. The choice of model hinges on the data’s characteristics and the problem’s nature.
Feature Engineering: Feature engineering involves preprocessing and transforming raw data to extract valuable features. This encompasses tasks like data cleaning, handling missing values, normalization or standardization, feature selection, and feature transformation. Effective feature engineering can significantly enhance model performance and generalization capabilities.
Training and Optimization: Leveraging labeled training data, we can train the model to capture the data relationship. Training typically involves defining a loss function, selecting an appropriate optimization algorithm, and iteratively adjusting model parameters to minimize the loss function. Common optimization algorithms include gradient descent and stochastic gradient descent.
Model Evaluation: Upon completing training, evaluating the model’s performance on new data is essential. Standard evaluation metrics include accuracy, precision, recall, F1 score, and ROC curve. Assessing a model’s performance enables us to gauge its suitability for practical applications.
In summary, supervised learning entails utilizing labeled training data to train a model for predicting or classifying new unlabeled data. Key steps encompass selecting an appropriate model, conducting feature engineering, training and optimizing the model, and evaluating its performance. Together, these components constitute the foundational elements of supervised learning.
Unsupervised Learning
Unsupervised learning involves providing an unlabeled dataset to the algorithm, where the correct answers are unknown. In this type of machine learning, the machine must mine potential structural relationships within the dataset.
For instance, in image classification, a large dataset of cat and dog pictures can be provided without any labels. Through unsupervised learning, the machine can learn to divide the pictures into two categories: cat pictures and dog pictures.
6.1.3 Common Type of Machine Learning Framework
There are a large variety of machine learning frameworks. Among them, PyTorch, Tensorflow, MXNet and paddlepaddle are common.
PyTorch
PyTorch is a powerful open-source machine learning framework, originally based on the BSD License Torch framework. It supports advanced multidimensional array operations and is widely used in the field of machine learning. PyTorch, built on top of Torch, offers even greater flexibility and functionality. One of its most distinguishing features is its support for dynamic computational graphs and its Python interface.
In contrast to TensorFlow’s static computation graph, PyTorch’s computation graph is dynamic. This allows for real-time modifications to the graph as computational needs change. Additionally, PyTorch enables developers to accelerate tensor calculations using GPUs, create dynamic computational graphs, and automatically calculate gradients. This makes PyTorch an ideal choice for machine learning tasks that require flexibility, speed, and powerful computing capabilities.
Tensorflow
TensorFlow is a powerful open-source machine learning framework that allows users to quickly construct neural networks and train, evaluate, and save them. It provides an easy and efficient way to implement machine learning and deep learning concepts. TensorFlow combines computational algebra with optimization techniques to make the calculation of many mathematical expressions easier.
One of TensorFlow’s key strengths is its ability to run on machines of varying sizes and types, including supercomputers, embedded systems, and everything in between. TensorFlow can also utilize both CPU and GPU computing resources, making it an extremely versatile platform. When it comes to industrial deployment, TensorFlow is often the most suitable machine learning framework due to its robustness and reliability. In other words, TensorFlow is an excellent choice for deploying machine learning applications in a production environment.
PaddlePaddle
PaddlePaddle is a cutting-edge deep learning framework developed by Baidu, which integrates years of research and practical experience in deep learning. PaddlePaddle offers a comprehensive set of features, including training and inference frameworks, model libraries, end-to-end development kits, and a variety of useful tool components. It is the first open-source, industry-level deep learning platform to be developed in China, offering rich and powerful features to developers worldwide.
Deep learning has proven to be a powerful tool in many machine learning applications in recent years. From image recognition and speech recognition to natural language processing, robotics, online advertising, automatic medical diagnosis, and finance, deep learning has revolutionized the way we approach these fields. With PaddlePaddle, developers can harness the power of deep learning to create innovative and cutting-edge applications that meet the needs of users and businesses alike.
MXNet
MXNet is a top-tier deep learning framework that supports multiple programming languages, including Python, C++, Scala, R, and more. It features a dataflow graph similar to other leading frameworks like TensorFlow and Theano, as well as advanced features such as robust multi-GPU support and high-level model building blocks comparable to Lasagne and Blocks. MXNet can run on virtually any hardware, including mobile phones, making it a versatile choice for developers.
MXNet is specifically designed for efficiency and flexibility, with accelerated libraries that enable developers to leverage the full power of GPUs and cloud computing. It also supports distributed computing across dynamic cloud architectures via distributed parameter servers, achieving near-linear scaling with multiple GPUs/CPUs. Whether you’re working on a small-scale project or a large-scale deep learning application, MXNet provides the tools and support you need to succeed.
6.2 Machine Learning Application
6.2.1 GPU Acceleration
GPU Accelerated Computing
A graphics processing unit (GPU) is a specialized micro processor used to process image in personal computers, workstations, game consoles and mobile devices (phone and tablet). Similar to CPU, but CPU is designed to implement complex mathematical and geometric calculations which are essential for graphics rendering.
GPU-accelerated computing is the employment of a graphics processing unit (GPU) along with a computer processing unit (CPU) in order to accelerate science, analytics, engineering, consumer and cooperation applications. Moreover, GPU can facilitate the applications on various platforms, including vehicles, phones, tablets, drones and robots.
Comparison between GPU and CPU
The main difference between CPU and GPU is how they handle the tasks. CPU consists of several cores optimized for sequential processing. While GPU owns a large parallel computing architecture composed of thousands of smaller and more effective cores tailored for multitasking simultaneously.
GPU stands out for thousands of cores and large amount of high-speed memory, and is initially intended for processing game and computer image. It is adept at parallel computing which is ideal for image processing, because the pixels are relatively independent. And the GPU provides a large number of cores to perform parallel processing on multiple pixels at the same time, but it only improves throughput without alleviating the delay. For example, when receives one message, it will use only one core to tackle this message although it has thousands of cores. GPU cores are usually employed to complete operations related to image processing, which is not universal as CPU.
Advantage of GPU
GPU is excellent in massive parallel operations, hence it has an important role in deep learning. Deep learning relies on neural network that is utilized to analyze massive data at high speed.
For example, if you want to let this network recognize the cat, you need to show it lots of the pictures of cats. And that is the forte of GPU. Besides, GPU consumes less resources than CPU.
6.2.2 TensorRT Acceleration
TensorRT Introduction
TensorRT is a high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. It is deployed to hyperscale data centers, embedded platforms, or automotive product platforms to accelerate the inference.
TensoRT supports almost all deep learning frameworks, such as TensorFlow, Caffe, Mxnet and Pytorch. Combing with new NVIDIA GPU, TensorRT can realize swift and effective deployment and inference on almost all frameworks.
To accelerate deployment inference, multiple methods to optimize the models are proposed, such as model compression, pruning, quantization and knowledge distillation. And we can use the above methods to optimize the models during training, however TensorRT optimize the trained models. It improves the model efficiency through optimizing the network computation graph.
After the network is trained, you can directly put the model training file into tensorRT without relying on deep learning framework.
Optimization Methods
TensorRT has the following optimization strategies:
(1) Precision Calibration
(2) Layer & Tensor Fusion
(3) Kernel Auto-Tuning
(4) Dynamic Tenser Memory
(5) Multi-Stream Execution
Precision Calibration
In the training phase of neural networks across most deep learning frameworks, network tensors commonly employ 32-bit floating-point precision (FP32). Following training, since backward propagation is unnecessary during deployment inference, there is an opportunity to judiciously decrease data precision, for instance, by transitioning to FP16 or INT8. This reduction in data precision has the potential to diminish memory usage and latency, leading to a more compact model size.
The table below provides an overview of the dynamic range for different precision:
| Precision | Dynamic Range |
|---|---|
| FP32 | −3.4×1038 ~ +3.4×1038 |
| FP16 | −65504 ~- +65504 |
| INT8 | −128 ~ +127 |
INT8 is limited to 256 distinct numerical values. When INT8 is employed to represent values with FP32 precision, information loss is certain, resulting in a decline in performance. Nevertheless, TensorRT provides a fully automated calibration process to optimally align performance by converting FP32 precision data to INT8 precision, thereby minimizing performance loss.
Layer & Tensor Fusion
While CUDA cores efficiently compute tensor operations, a significant amount of time is still spent on the initialization of CUDA cores and read/write operations for each layer’s input/output tensors. This results in GPU resource wastage and creates a bottleneck in memory bandwidth.
TensorRT optimizes the model structure by horizontally or vertically merging layers, reducing the number of layers and consequently decreasing the required CUDA core count, achieving structural optimization.
Horizontal merging combines convolution, bias, and activation layers into a unified CBR structure, utilizing only one CUDA core. Vertical merging consolidates layers with identical structures but different weights into a broader layer, also using only one CUDA core.
Moreover, in cases of multi-branch merging, TensorRT can eliminate concat layers by directing layer outputs to the correct memory address without copying, thereby reducing memory access frequency.
Kernel Auto-Tuning
During the inference calculation process, the neural network model utilizes the GPU’s CUDA cores for computation. TensorRT can adjust the CUDA cores based on different algorithms, network models, and GPU platforms, ensuring that the current model can perform computational operations with optimal performance on specific platforms.
Dynamic Tenser Memory
During the utilization of each Tensor, TensorRT allocates dedicated GPU memory to prevent redundant memory requests, thereby reducing memory consumption and enhancing the efficiency of memory reuse.
Multi-Stream Execution
By leveraging CUDA Streams, parallel computation is achieved for multiple branches of the same input, maximizing the potential for parallel operations.
6.2.3 Yolov5 Model
Yolo Model Series Introduction
(1) YOLO Series
YOLO (You Only Look Once) is an one-stage regression algorithm based on deep learning.
R-CNN series algorithm dominates target detection domain before YOLOv1 is released. It has higher detection accuracy, but cannot achieve real-time detection due to its limited detection speed engendered by its two-stage network structure.
To tackle this problem, YOLO is released. Its core idea is to redefine target detection as a regression problem, use the entire image as network input, and directly return position and category of Bounding Box at output layer. Compared with traditional methods for target detection, it distinguishes itself in high detection speed and high average accuracy.
(2) YOLOv5
YOLOv5 is an optimized version based on previous YOLO models, whose detection speed and accuracy is greatly improved.
In general, a target detection algorithm is divided into 4 modules, namely input end, reference network, Neck network and Head output end. The following analysis of improvements in YOLOv5 rests on these four modules.
① Input end: YOLOv5 employs Mosaic data enhancement method to increase model training speed and network accuracy at the stage of model training. Meanwhile, adaptive anchor box calculation and adaptive image scaling methods are proposed.
② Reference network: Focus structure and CPS structure are introduced in YOLOv5.
③ Neck network: same as YOLOv4, Neck network of YOLOv5 adopts FPN+PAN structure, but they differ in implementation details.
④ Head output layer: YOLOv5 inherits anchor box mechanism of output layer from YOLOv4. The main improvement is that loss function GIOU_Loss, and DIOU_nms for prediction box screening are adopted.
YOLOv5 Model Structure
(1) Component
① Convolution layer: extract features of the image
Convolution refers to the effect of a phenomenon, action or process that occurs repeatedly over time, impacting the current state of things. Convolution can be divided into two components: “volume” and “accumulation”. “Volume” involves data flipping, while “accumulation” refers to the accumulation of the influence of past data on current data. Flipping the data helps to establish the relationships between data points, providing a reference for calculating the influence of past data on the current data.
In YOLOv5, the data being processed is typically an image, which is two-dimensional in computer vision. Therefore, the convolution applied is also a two-dimensional convolution, with the aim of extracting features from the image. The convolution kernel is an unit area used for each calculation, typically in pixels. The kernel slides over the image, with the size of the kernel being manually set.
During convolution, the periphery of the image may remain unchanged or be expanded as needed, and the convolution result is then placed back into the corresponding position in the image. For instance, if an image has a resolution of 6×6, it may be first expanded to a 7×7 image, and then substituted into the convolution kernel for calculation. The resulting data is then refilled into a blank image with a resolution of 6×6.
② Pooling layer: enlarge the features of image
The pooling layer is an essential part of a convolutional neural network and is commonly used for downsampling image features. It is typically used in combination with the convolutional layer. The purpose of the pooling layer is to reduce the spatial dimension of the feature map and extract the most important features.
There are different types of pooling techniques available, including global pooling, average pooling, maximum pooling, and more. Each technique has its unique effect on the features extracted from the image.
Maximum pooling can extract the most distinctive features from an image, while discarding the remaining ones. For example, if we take an image with a resolution of 6×6 pixels, we can use a 2×2 filter to downsample the image and obtain a new image with reduced dimensions.
③ Upsampling layer: restore the size of an image
This process is sometimes referred to as “anti-pooling”. While upsampling restores the size of the image, it does not fully recover the features that were lost during pooling. Instead, it tries to interpolate the missing information based on the available information.
For example, let’s consider an image with a resolution of 6×6 pixels. Before upsampling, use 3X3 filter to calculate the original image so as to get the new image.
④ Batch normalization layer: organize data
It aims to reduce the computational complexity of the model and to ensure that the data is better mapped to the activation function.
Batch normalization works by standardizing the data within each mini-batch, which reduces the loss of information during the calculation process. By retaining more features in each calculation, batch normalization can improve the sensitivity of the model to the data.
⑤ RELU layer: activate function
The activation function is a crucial component in the process of building a neural network, as it helps to increase the nonlinearity of the model. Without an activation function, each layer of the network would be equivalent to a matrix multiplication, and the output of each layer would be a linear function of the input from the layer above. This would result in a neural network that is unable to learn complex relationships between the input and output.
There are many different types of activation functions. Some of the most common activation functions include the ReLU, Tanh, and Sigmoid. For example, ReLU is a piecewise function that replaces all values less than zero with zero, while leaving positive values unchanged.
⑥ ADD layer: add tensor
In a typical neural network, the features can be divided into two categories: salient features and inconspicuous features.
⑦ Concat layer: splice tensor
It is used to splice together tensors of features, allowing for the combination of features that have been extracted in different ways. This can help to increase the richness and complexity of the feature set.
Compound Element
When building a model, using only the layers mentioned above to construct functions can lead to lengthy, disorganized, and poorly structured code. By assembling basic elements into various units and calling them accordingly, the efficiency of writing the model can be effectively improved.
(1) Convolutional unit:
A convolutional unit consists of a convolutional layer, a batch normalization layer, and an activation function. The convolution is performed first, followed by batch normalization, and finally activated using an activation function.
(2) Focus module
The Focus module for interleaved sampling and concatenation first divides the input image into multiple large regions and then concatenates the small images at the same position within each region to break down the input image into several smaller images. Finally, the images are preliminarily sampled using convolutional units.
As shown in the figure below, taking an image with a resolution of 6×6 as an example, if we set a large region as 2×2, then the image can be divided into 9 large regions, each containing 4 small images.
By concatenating the small images at position 1 in each large region, a 3×3 image can be obtained. The small images at other positions are similarly concatenated, and the original 6×6 image will be broken down into four 3×3 images.
(3) Residual unit
The function of the residual unit is to enable the model to learn small changes in the image. Its structure is relatively simple and is achieved by combining data from two paths.
The first path uses two convolutional units to sample the image, while the second path does not use convolutional units for sampling but directly uses the original image. Finally, the data from the first path is added to the second path.
(4) Composite Convolution Unit
In YOLOv5, the composite convolution unit is characterized by the ability to customize the convolution unit according to requirements. The composite convolution unit is also realized by superimposing data obtained from two paths.
The first path only has one convolutional layer for sampling, while the second path has 2x+1 convolutional units and one convolutional layer for sampling. After sampling and splicing, the data is organized through batch normalization and then activated by an activation function. Finally, a convolutional layer is used for sampling.
(5) Compound Residual Convolutional Unit
The compound residual convolutional unit replaces the 2x convolutional layers in the compound convolutional unit with x residual units. In YOLOv5, the feature of the compound residual unit is mainly that the residual units can be customized according to the needs.
(6) Composite Pooling Unit
The output data of the convolutional unit is fed into three max pooling layers and an additional copy is kept without processing. Then, the data from the four paths are concatenated and input into a convolutional unit. Using the composite pooling unit to process the data can significantly enhance the features of the original data.
Structure
Composed of three parts, YOLOv5 can output three sizes of data. Data of each size is processed in different way. The below picture is the output structure of YOLOv5.
Below is the output structures of data of three sizes.
6.2.4 YOLOv5 Running Procedure
In this section, we provide an explanation of the model workflow using the anchor boxes, prediction boxes, and prior boxes employed in YOLOv5.
Prior Bounding Box
When an image is input into model, object detection area requires us to offer, while prior bounding box is that box used to mark the object detection area on image before detection.
Prediction Box
The prediction box is not required to set manually, which is the output result of the model. When the first batch of training data is input into model, the prediction box will be automatically generated with it. The position in which the object of same type appear more frequently are set as the center of the prediction box.
Anchor Box
After the prediction box is generated, deviation may occur in its size and position. At this time, the anchor box serves to calibrate the size and position of the prediction box.
The generation position of anchor box is determined by prediction box. In order to influence the position of the next generation of the prediction box, the anchor box is generated at the relative center of the existing prediction box.
Realization Process
After the data is calibrated, a prior bounding box appears on image. Then, the image data is input to the model, the model generates a prediction box based on the position of the prior bounding box. Having generated the prediction box, an anchor box will appear automatically. Lastly, the weights from this training are updated into model.
Each newly generated prediction will be influenced by the last generated anchor box. Repeating the operations above continuously, the deviation of the size and position of the prediction box will be gradually erased until it coincides with the priori box.
6.3 MediaPipe Man-Robot Interaction
6.3.1 MediaPipe Introduction
MediaPipe Description
MediaPipe is an open-source framework of multi-media machine learning models. Cross-platform MediaPipe can run on mobile devices, workspace and servers, as well as support mobile GPU acceleration. It is also compatible with TensorFlow and TF Lite Inference Engine, and all kinds of TensorFlow and TF Lite models can be applied on it. Besides, MediaPipe supports GPU acceleration of mobile and embedded platform.
MediaPipe Pros and Cons
(1) MediaPipe Pros
① MediaPipe supports various platforms and languages, including iOS, Android, C++, Python, JAVAScript, Coral, etc.
② Swift running. Models can run in real-time.
③ Models and codes are with high reuse rate.
(2) MediaPipe Cons
① For mobile devices, MediaPipe will occupy 10M or above.
② As it greatly depends on Tensorflow, you need to alter large amount of codes if you want to change it to other machine learning frameworks, which is not friendly to machine learning developer.
③ It adopts static image which can improve efficiency, but make it difficult to find out the errors.
How to use MediaPipe
The figure below shows how to use MediaPipe. The solid line represents the part to coded, and the dotted line indicates the part not to coded. MediaPipe can offer the result and the function realization framework quickly.
Dependency
MediaPipe utilizes OpenCV to process video, and uses FFMPEG to process audio data. Furthermore, it incorporates other essential dependencies, including OpenGL/Metal, Tensorflow, and Eigen.
For seamless usage of MediaPipe, we suggest gaining a basic understanding of OpenCV.
MediaPipe Solutions
Solutions is based on the open-source pre-constructed sample of TensorFlow or TFLite. MediaPipe Solutions is built upon a framework, which provides 16 Solutions, including face detection, Face Mesh, iris, hand, posture, human body and so on.
MediaPipe Learning Resources
MediaPipe website:https://developers.google.com/mediapipe
MediaPipe Wiki:http://i.bnu.edu.cn/wiki/index.php?title=Mediapipe
MediaPipe github:https://github.com/google/mediapipe
Dlib website: http://dlib.net/
dlib github: https://github.com/davisking/dlib
6.3.2 Image Background Segmentation
This lesson provides instructions on utilizing MediaPipe’s selfie segmentation model to accurately segment trained models, such as human faces and hands, from their backgrounds. Once separated, you can easily add virtual backgrounds to these models.
Program Logic
To begin, import the selfie segmentation model from MediaPipe and subscribe to the corresponding topic to access the live camera feed.
Next, flip the image and apply the segmentation to the background image. For improved boundary segmentation, implement dual-border segmentation.
Finally, complete the process by replacing the background with a virtual background.
Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node.
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) Enter the command and hit Enter key to run the game program.
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 self_segmentation.py
(6) To close this feature, press the ‘Esc’ key to exit the camera image interface.
(7) Next, press “Ctrl+C” in the terminal. If it fails to close, please try again.
Program Outcome
Once the feature is activated, the screen changes to a gray virtual background, and when a hand enters the frame, it is automatically separated from the background.
Program Analysis
The source code of this program locates in:
/ros2_ws/src/example/example/mediapipe_example/self_segmentation.py
Function
Main:
81 82 83 84 85 86 87 88 89 90 | def main(): node = SegmentationNode('self_segmentation') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to start the background control node.
Class
SegmentationNode:
14 15 16 17 18 19 20 21 22 23 24 25 26 27 | class SegmentationNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() self.mp_selfie_segmentation = mp.solutions.selfie_segmentation self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.BG_COLOR = (192, 192, 192) # gray self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Init:
15 16 17 18 19 20 21 22 23 24 25 26 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() self.mp_selfie_segmentation = mp.solutions.selfie_segmentation self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.BG_COLOR = (192, 192, 192) # gray self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') |
Initialize the parameters required for background segmentation, call the image callback function, and start the model recognition function.
image_callback:
29 30 31 32 33 34 35 36 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image into the queue) self.image_queue.put(rgb_image) |
Image callback function, used to read data from the camera node and enqueue it.
Main:
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | def main(self): with self.mp_selfie_segmentation.SelfieSegmentation( model_selection=1) as selfie_segmentation: bg_image = None while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue # To improve performance, optionally mark the image as not writeable to # pass by reference. image.flags.writeable = False results = selfie_segmentation.process(image) image.flags.writeable = True image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # Draw selfie segmentation on the background image. # To improve segmentation around boundaries, consider applying a joint # bilateral filter to "results.segmentation_mask" with "image". condition = np.stack( (results.segmentation_mask,) * 3, axis=-1) > 0.1 # The background can be customized. # a) Load an image (with the same width and height of the input image) to # be the background, e.g., bg_image = cv2.imread('/path/to/image/file') # b) Blur the input image by applying image filtering, e.g., # bg_image = cv2.GaussianBlur(image,(55,55),0) if bg_image is None: bg_image = np.zeros(image.shape, dtype=np.uint8) bg_image[:] = self.BG_COLOR output_image = np.where(condition, image, bg_image) self.fps.update() result_image = self.fps.show_fps(output_image) cv2.imshow('MediaPipe Selfie Segmentation', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) break cv2.destroyAllWindows() rclpy.shutdown() |
Load the model from MediaPipe, input the image, and display the output image using OpenCV.
6.3.3 3D Object Detection
Program Logic
To get started, import the 3D Objectron module from MediaPipe, and subscribe to the topic message to receive the real-time camera image.
Next, flip the image to ensure proper alignment for 3D object detection.
Finally, draw the 3D boundary frame on the image.
Operation Steps
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the following command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) In a new command line terminal, enter the command and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 objectron.py
(6) To close this game, press the “Esc” key in the image interface to exit the camera image interface.
(7) Press “Ctrl+C” in the command line terminal interface. If the closure fails, please try again repeatedly.
Program Outcome
Once the game starts, the 3D frame will be drawn around the boundary of the recognized object. The system can identify several objects, including a cup (with handle), shoe, chair, and camera.
Program Analysis
The program file corresponding to this section of the course documentation is located at: /ros2_ws/src/example/example/mediapipe_example/objectron.py
Function
Main:
76 77 78 79 80 81 82 83 84 85 | def main(): node = ObjectronNode('objectron') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to initiate 3D detection node.
Class
ObjectronNode:
14 | class ObjectronNode(Node): |
Init:
15 16 17 18 19 20 21 22 23 24 25 26 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() self.mp_objectron = mp.solutions.objectron self.mp_drawing = mp.solutions.drawing_utils self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize the parameters required for 3D recognition, call the image callback function, and start the model recognition function.
image_callback:
28 29 30 31 32 33 34 35 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像 self.image_queue.get() # 将图像放入队列 self.image_queue.put(rgb_image) |
Image callback function, used to read data from the camera node and enqueue it.
Main :
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | def main(self): with self.mp_objectron.Objectron(static_image_mode=False, max_num_objects=1, min_detection_confidence=0.4, min_tracking_confidence=0.5, model_name='Cup') as objectron: while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue # To improve performance, optionally mark the image as not writeable to # pass by reference. image.flags.writeable = False results = objectron.process(image) # Draw the box landmarks on the image. image.flags.writeable = True image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) if results.detected_objects: for detected_object in results.detected_objects: self.mp_drawing.draw_landmarks( image, detected_object.landmarks_2d, self.mp_objectron.BOX_CONNECTIONS) self.mp_drawing.draw_axis(image, detected_object.rotation, detected_object.translation) self.fps.update() result_image = self.fps.show_fps(cv2.flip(image, 1)) # Flip the image horizontally for a selfie-view display. cv2.imshow('MediaPipe Objectron', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) break cv2.destroyAllWindows() rclpy.shutdown() |
Read the model inside MediaPipe, input the image, draw the edges of the objects after obtaining the output image, and display using OpenCV.
6.3.4 Face Detection
In this lesson, we’ll use MediaPipe’s face detection model to detect faces in the frame. MediaPipe’s face detection is an ultra-fast solution that supports multiple faces and identifies 6 key landmarks. It’s built on BlazeFace, a lightweight and highly efficient face detector optimized for mobile GPU inference.
Program Logic
To begin, import the human face detection model from MediaPipe and subscribe to the relevant topic message to obtain the live camera feed.
Next, utilize OpenCV to flip the image and convert the color space for further processing.
Then, using the face detection model’s minimum confidence threshold, determine whether a human face has been successfully detected. If a human face is recognized, proceed to collect the necessary information about each detected face, including the bounding frame and the 6 key points (right eye, left eye, nose tip, right ear, and left ear).
Finally, frame the human face and mark the 6 key points on each detected face for visual clarity and further analysis.
Operation Steps
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) In a new command line terminal, enter the command and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 face_detect.py
(6) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(7) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
After the game starts, depth camera will start detecting human face, and human face will framed on the live camera feed.
Program Analysis
The source code of this program is saved in :/ros2_ws/src/example/example/mediapipe_example/face_detect.py
Function
Main:
67 68 69 70 71 72 73 74 75 76 | def main(): node = FaceDetectionNode('face_detection') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to initiate face detection node.
Class
FaceDetectionNode:
18 | class FaceDetectionNode(Node): |
Init:
19 20 21 22 23 24 25 26 27 28 29 30 31 32 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/detector.tflite') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceDetectorOptions(base_options=base_options) self.detector = vision.FaceDetector.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize the parameters required for face recognition, call the image callback function, and start the model recognition function.
image_callback:
34 35 36 37 38 39 40 41 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image into the queue) self.image_queue.put(rgb_image) |
Image callback function, used to read data from the camera node and enqueue it.
Main:
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = visualize(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('face_detection', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) break cv2.destroyAllWindows() rclpy.shutdown() |
Read the model from MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw the facial keypoints and display the feedback image.
6.3.5 3D Face Detection
In this program, MediaPipe Face Mesh is utilized to detect human face within the camera image.
MediaPipe Face Mesh is a powerful model capable of estimating 468 3D facial features, even when deployed on a mobile device. It employs machine learning to infer the 3D face contour accurately. Additionally, this model ensures real-time detection by utilizing a lightweight model architecture and GPU acceleration.
Furthermore, this model is integrated with a face conversion module that compensates for any differences between face landmark estimation and AR (Augmented Reality) applications. It establishes a metric 3D space and utilizes the facial landmark screen positions to estimate facial conversion within this space. The facial conversion data consists of common 3D primitives, including facial gesture conversion matrices and triangle facial mesh information.
Program Logic
Firstly, you need to learn that machine learning pipeline is composed of two real-time deep neural network models. The system consists of two components: a face detector that processes the entire image and calculates the locations of faces, and a 3D face landmark model that uses these locations to predict an approximate 3D surface through regression.
To achieve 3D facial landmarks, we utilize transfer learning and train a network with multiple objectives: predicting 3D landmark coordinates on synthetic rendered data and 2D semantic contours on annotated real-world data simultaneously. This approach yields plausible 3D landmark predictions not only based on synthetic data but also on real-world data.
The 3D landmark network takes cropped video frames as input without requiring additional depth input. The model outputs the location of a 3D point and the probability that a face appears in the input and is properly aligned.
Once the face mesh model is imported, real-time images can be obtained from the camera by subscribing to topic messages.
Next, image preprocessing techniques like flipping the image and converting the color space are applied. The face detection model’s minimum confidence is then used to determine whether the face has been successfully detected.
Finally, the detected face on the screen is projected into a 3D grid for visualization and display.
Operation Steps
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) Enter the command in a new command-line terminal and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 face_mesh.py
(6) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(7) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
After starting the game, when the depth camera detects a face, it will outline the face in the feedback image.
Program Analysis
The source code of this program is located in
/ros2_ws/src/example/example/mediapipe_example/face_mesh.py
Function
Main:
70 71 72 73 74 75 76 77 78 79 | def main(): node = FaceMeshNode('face_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to initiate the 3D face detection node.
Class
FaceMeshNode:
18 | class FaceMeshNode(Node): |
Init:
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/face_landmarker_v2_with_blendshapes.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.FaceLandmarkerOptions(base_options=base_options, output_face_blendshapes=True, output_facial_transformation_matrixes=True, num_faces=1) self.detector = vision.FaceLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize the parameters required for 3D face detection, call the image callback function, and start the model recognition function.
image_callback:
39 40 41 42 43 44 45 46 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image into the queue) self.image_queue.put(rgb_image) |
The image callback function is used to retrieve data from the camera node and encapsulate it into a queue.
Main:
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = draw_face_landmarks_on_image(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('face_landmarker', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) break cv2.destroyAllWindows() rclpy.shutdown() |
Reading the model inside MediaPipe, inputting the image, and then using OpenCV to draw facial keypoints on the output image, and display the feedback image.
6.3.6 Hand Key Point Detection
MediaPipe’s hand detection model is employed to showcase the key points of the hand and the connecting lines of these key points on the live camera feed.
MediaPipe Hands is an advanced hand and finger detection model that delivers high-fidelity results. Through the power of machine learning (ML), it accurately infers 21 3D landmarks of the hand from a single frame.
Program Logic
Firstly, it’s important to understand that MediaPipe’s palm detection model employs a machine learning pipeline consisting of multiple models (including a linear model). This model processes the entire image and returns an oriented hand bounding box. On the other hand, the hand landmark model operates on cropped image regions defined by the palm detectors and provides high-fidelity 3D hand keypoints.
To begin, after importing the palm detection model, we subscribe to the topic message to obtain real-time camera images.
Next, various image pre-processing steps, such as flipping the image and converting the color space, are applied. These steps significantly reduce the need for data augmentation for the hand landmark model.
Furthermore, our pipeline allows for generating crops based on hand landmarks identified in the previous frame. The palm detection is invoked only when the landmark model is unable to recognize the presence of the hand accurately.
Afterward, by comparing the minimum confidence level of the hand detection model, we determine whether the palm has been successfully detected.
Lastly, the hand keypoints are detected and drawn on the camera image to visualize the detected hand in real-time.
Operation Steps
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) Enter the command in a new command-line terminal and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 hand.py
(6) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(7) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
Once the game starts, the depth camera will begin detecting the hand and display the hand key points on the camera image, with the key points connected.
Program Analysis
The program file corresponding to this section of the course documentation is located at:
/ros2_ws/src/example/example/mediapipe_example/hand.py
Function
Main:
65 66 67 68 69 70 71 72 73 74 | def main(): node = HandNode('hand_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to initiate the 3D face detection node.
Class
HandNode:
18 | class HandNode(Node): |
Init:
19 20 21 22 23 24 25 26 27 28 29 30 31 32 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/hand_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=2) self.detector = vision.HandLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize the parameters required for hand keypoint detection, call the image callback function, and start the model recognition function.
image_callback:
34 35 36 37 38 39 40 41 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image into the queue) self.image_queue.put(rgb_image) |
The image callback function is used to read data from the camera node and enqueue it.
Main:
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue image = cv2.flip(image, 1) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) detection_result = self.detector.detect(mp_image) annotated_image = draw_hand_landmarks_on_image(image, detection_result) self.fps.update() result_image = self.fps.show_fps(cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)) cv2.imshow('hand_landmarker', result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) break cv2.destroyAllWindows() rclpy.shutdown() |
Read the model from MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw the key points of the hand and display the feedback image.
6.3.7 Body Key Points Detection
The MediaPipe body detection model is utilized to detect key points on the human body, which are then displayed on the live camera feed. This implementation incorporates MediaPipe Pose, a high-fidelity posture tracking model that leverages BlazePose to infer 33 3D key points. Additionally, this approach offers support for the ML Kit Pose Detection API.
Program Logic
Firstly, import body detection model.
Subsequently, flip over the image and convert the color space of the image. Check whether the human body is successfully detected based on the minimum confidence of the body detection model.
Next, define the tracked posture by comparing the minimum tracking confidence. If the confidence does not meet the minimum threshold, perform automatic human detection on the next input image.
In the pipeline, a detector is employed to initially localize the region of interest (ROI) corresponding to a person’s pose within a frame. Subsequently, a tracker utilizes the cropped ROI frame as input to predict pose landmarks and segmentation masks within the ROI.
For video applications, the detector is invoked selectively, only when necessary. Specifically, it is used for the first frame or when the tracker fails to recognize the body pose in the preceding frame. In all other frames, the pipeline derives ROIs based on the pose landmarks detected in the previous frame.
After MediaPipe body detection model is imported, access the live camera feed through subscribing the related topic message.
Lastly, draw the key points representing the human body.
Operation Steps
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) Enter the command in a new command-line terminal and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 pose.py
(6) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(7) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
After the game starts, depth camera will begin detecting human pose, and body key points can be displayed and connected on the live camera feed.
Program Analysis
The program file is saved in:/ros2_ws/src/example/example/mediapipe_example/pose.py
Function
Main:
67 68 69 70 71 72 73 74 75 76 | def main(): node = PoseNode('pose_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Used to initiate the 3D face detection node.
Class
PoseNode:
18 | class PoseNode(Node): |
Init:
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.bridge = CvBridge() model_path = os.path.join(os.path.abspath(os.path.split(os.path.realpath(__file__))[0]), 'model/pose_landmarker.task') base_options = python.BaseOptions(model_asset_path=model_path) options = vision.PoseLandmarkerOptions( base_options=base_options, output_segmentation_masks=True) self.detector = vision.PoseLandmarker.create_from_options(options) self.fps = fps.FPS() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize the parameters required for limb detection, call the image callback function, and start the model recognition process.
image_callback:
36 37 38 39 40 41 42 43 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image into the queue) self.image_queue.put(rgb_image) |
Image callback function, used to read data from the camera node and enqueue it.
Main:
67 68 69 70 71 72 73 74 75 76 | def main(): node = PoseNode('pose_landmarker') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
Read the model inside MediaPipe, input the image, and after obtaining the output image, use OpenCV to draw facial keypoints and display the feedback image.
6.3.8 Fingertip Trajectory Recognition
Identify hand joints using MediaPipe’s hand detection model. Once a specific gesture is recognized, the robot will initiate fingertip locking on the screen, track the fingertips, and generate their movement trajectory.
Program Logic
First, invoke the MediaPipe hand detection model to capture the camera image. Next, flip and process the image to extract hand information. Utilizing the connection lines between key points of the hand, calculate the finger angles to determine the gesture. Upon recognition of a specific gesture, the robot will proceed to identify and lock the fingertips on the screen, simultaneously tracing the movement trajectory of the fingertips on the display.
Operation Steps
Note
The input command should be case sensitive, and the keyword can be complemented by “Tab” key.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to enable the camera node:
Depth camera:
ros2 launch peripherals depth_camera.launch.py
Monocular camera:
ros2 launch peripherals usb_cam.launch.py
(5) Enter the command in a new ROS2 command-line terminal and press Enter to run the game program:
cd ~/ros2_ws/src/example/example/mediapipe_example && python3 hand_gesture.py
(6) The program will enable the camera automatically. The detailed recognition process can be found in ‘6.3.8 Program Outcome’.
(7) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(8) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
Once the game starts, position your hand within the camera’s field of view. Upon recognition, the hand keypoints will be highlighted on the camera feed.
If the robot detects the “1” gesture, the trajectory of your fingertip motion will begin to be recorded on the camera feed. If it detects the “5” gesture, the recorded fingertip trajectory will be cleared.
Program Analysis
The program file is saved in
/ros2_ws/src/example/example/mediapipe_example/hand_gesture.py
Note
Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable!
Based on the game’s impact, the process logic of this game is organized as depicted in the figure below:
As depicted in the image above, the purpose of this game is to capture an image using the camera, preprocess it by converting its color space for easier identification, extract feature points corresponding to hand gestures from the converted image, and determine different gestures (based on angles) through logical analysis of key feature points. Finally, the trajectory of the recognized gesture is drawn on the display screen.
The program’s logic flowchart extracted from the program files is illustrated in the figure below.
From the above diagram, it can be seen that the program’s logical flow is mainly divided into initialization functions and recognition processing functions. The following document content will be written according to the program logic flow chart mentioned above.
Function
Main:
212 213 214 215 216 217 218 219 220 221 | def main(): node = HandGestureNode('hand_gesture') try: rclpy.spin(node) except KeyboardInterrupt: node.destroy_node() rclpy.shutdown() print('shutdown') finally: print('shutdown finish') |
The main function is used to start the fingertip trajectory recognition node.
get_hand_landmarks:
19 20 21 22 23 24 25 26 27 28 | def get_hand_landmarks(img, landmarks): """ Convert landmarks from normalized output of Mediapipe to pixel coordinates(将landmarks从medipipe的归一化输出转为像素坐标) :param img: The image corresponding to pixel coordinates(像素坐标对应的图片) :param landmarks: The normalized key points(归一化的关键点) :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) |
Convert the normalized data from madipipe into pixel coordinates.
hand_angle:
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | def hand_angle(landmarks): """ Calculate the bending angle of each finger(计算各个手指的弯曲角度) :param landmarks: Hand key point(手部关键点) :return: The angle of each finger(各个手指的角度) """ angle_list = [] # thumb 大拇指 angle_ = vector_2d_angle(landmarks[3] - landmarks[4], landmarks[0] - landmarks[2]) angle_list.append(angle_) # index finger(食指) angle_ = vector_2d_angle(landmarks[0] - landmarks[6], landmarks[7] - landmarks[8]) angle_list.append(angle_) # middle finger(中指) angle_ = vector_2d_angle(landmarks[0] - landmarks[10], landmarks[11] - landmarks[12]) angle_list.append(angle_) # ring finger(无名指) angle_ = vector_2d_angle(landmarks[0] - landmarks[14], landmarks[15] - landmarks[16]) angle_list.append(angle_) # pinky (小拇指) angle_ = vector_2d_angle(landmarks[0] - landmarks[18], landmarks[19] - landmarks[20]) angle_list.append(angle_) angle_list = [abs(a) for a in angle_list] return angle_list |
After extracting the hand feature points into the results variable, it is necessary to logically process these points. By evaluating the angular relationship between the feature points, specific finger types (thumb, index finger) can be identified. The hand_angle function accepts the landmark feature point set (results) as input, and subsequently employs the vector_2d_angle function to compute the angles between the corresponding feature points. The feature points corresponding to the elements of the landmark set are depicted in the figure below:
Taking the thumb’s angle calculation as an example: the vector_2d_angle function is used to calculate the angle between joint points. landmarks[3], landmarks[4], landmarks[0], and landmarks[2] correspond to feature points 3, 4, 0, and 2 in the hand feature extraction diagram. By calculating the angles of these joint points, the thumb’s posture characteristics can be determined. Similarly, the processing logic for the other finger joints is analogous.
To ensure the accuracy of recognition, the parameters and basic logic (addition and subtraction of angle calculations) in the hand_angle function should remain at their default settings.
h_gesture:
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | def h_gesture(angle_list): """ Determining the hand gesture displayed by the fingers through two-dimensional features(通过二维特征确定手指所摆出的手势) :param angle_list: Calculate the bending angle of each finger(各个手指弯曲的角度) :return : Gesture name string(手势名称字符串) """ thr_angle = 65. thr_angle_thumb = 53. thr_angle_s = 49. gesture_str = "none" if (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "fist" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "hand_heart" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s): gesture_str = "nico-nico-ni" elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "hand_heart" elif (angle_list[0] > 5) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "one" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "two" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] > thr_angle): gesture_str = "three" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "OK" elif (angle_list[0] > thr_angle_thumb) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "four" elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and ( angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s): gesture_str = "five" elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] < thr_angle_s): gesture_str = "six" else: "none" return gesture_str |
After identifying the different finger types of the hand and determining their positions on the image, logical recognition processing of various gestures can be performed by implementing the h_gesture function.
In the h_gesture function depicted above, the parameters thr_angle, thr_angle_thenum, and thr_angle_s represent the angle threshold values for corresponding gesture logic points. These values have been empirically tested to ensure stable recognition effects. It is not recommended to alter them unless the logic processing effect is unsatisfactory, in which case adjustments within a range of ±5 values are sufficient. The angle_list[0,1,2,3,4] corresponds to the five finger types associated with the palm.
Here’s an example using the gesture “one”:
77 78 79 | elif (angle_list[0] > 5) and (angle_list[1] < thr_angle_s) and (angle_list[2] > thr_angle) and ( angle_list[3] > thr_angle) and (angle_list[4] > thr_angle): gesture_str = "one" |
The code presented represents the logical angle evaluation of the fingers for the “one” gesture. angle_list[0]>5 checks whether the angle value of the thumb joint feature point in the image is greater than 5. angle_list[1]<thr_angle_s checks if the angle feature of the index finger joint feature point is less than the predetermined value thr_angle_s. Similarly, angle_list[2]<thr_angle verifies if the angle feature of the middle finger feature point is less than the predetermined value thr_angle. The logical processing for the other two fingers, angle_list[3] and angle_list[4], follows a similar method. When the above conditions are met, the current gesture feature is recognized as “one”, and the same principle applies to recognizing other gesture features.
Different gesture recognitions involve distinct logical processing, but the overall logical framework remains similar. For recognizing other gesture features, refer to the previous paragraph.
draw_points:
108 109 110 111 112 113 114 | def draw_points(img, points, thickness=4, color=(255, 0, 0)): points = np.array(points).astype(dtype=np.int64) if len(points) > 2: for i, p in enumerate(points): if i + 1 >= len(points): break cv2.line(img, p, points[i + 1], color, thickness) |
Draw the currently recognized hand shape and each joint point.
Class
State:
102 103 104 105 106 | class State(enum.Enum): NULL = 0 START = 1 TRACKING = 2 RUNNING = 3 |
An enumeration class used to set the current state of the program.
HandGestureNode:
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | class HandGestureNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.drawing = mp.solutions.drawing_utils self.hand_detector = mp.solutions.hands.Hands( static_image_mode=False, max_num_hands=1, min_tracking_confidence=0.05, min_detection_confidence=0.6 ) self.fps = fps.FPS() # FPS calculator(fps计算器) self.state = State.NULL self.points = [] self.count = 0 self.bridge = CvBridge() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
The HandGestureNode is a fingertip trajectory recognition node that contains three functions: an initialization function, a main function, and an image callback function.
Init:
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | def __init__(self, name): rclpy.init() super().__init__(name) self.running = True self.drawing = mp.solutions.drawing_utils self.hand_detector = mp.solutions.hands.Hands( static_image_mode=False, max_num_hands=1, min_tracking_confidence=0.05, min_detection_confidence=0.6 ) self.fps = fps.FPS() # FPS calculator(fps计算器) self.state = State.NULL self.points = [] self.count = 0 self.bridge = CvBridge() self.image_queue = queue.Queue(maxsize=2) self.image_sub = self.create_subscription(Image, '/depth_cam/rgb/image_raw', self.image_callback, 1) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') threading.Thread(target=self.main, daemon=True).start() |
Initialize each component needed and call the camera node.
6.3.9 Posture Control
The human posture estimation model, trained using the MediaPipe machine learning framework, detects the human body feature in the captured image, identifies relevant joint positions, and subsequently recognizes a variety of sequential actions. This process enables direct control of the robot through somatosensory input.
Viewed from the perspective of the robot, the following actions correspond to specific movements:
① If the user lifts their left arm, the robot will move a certain distance to the right.
② If the user lifts their right arm, the robot will move a certain distance to the left.
③ If the user lifts their left leg, the robot will move forward a certain distance.
④ If the user lifts their right leg, the robot will move backward a certain distance.
Program Logic
First, import MediaPipe’s human pose estimation model and subscribe to topic messages to obtain real-time footage from the camera.
MediaPipe is an open-source multimedia machine learning model application framework that runs cross-platform on mobile devices, workstations, and servers. It supports mobile GPU acceleration and inference engines such as TensorFlow and TF Lite.
Next, utilize the built model to detect key points of the human body in the screen. Connect these key points to display the human body and determine the human body posture.
Finally, if a specific action is detected in the human body posture, the robot will respond accordingly.
Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Execute the command to run the game program:
ros2 launch example body_control.launch.py
(5) If you need to close this gameplay, you need to press the “Esc” key in the image interface to exit the camera image interface.
(6) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
Once the game is initiated, stand within the camera’s field of view. When a person is detected, the screen will display key points of the body and lines connecting them.
From the perspective of the robot, lifting the left arm will cause the robot to turn left; lifting the right arm will make the robot turn right; lifting the left leg will make the robot move forward a certain distance; lifting the right leg will make the robot move backward a certain distance.
Program Analysis
The program file is saved in
ros2_ws/src/example/example/body_control/include/body_control.py
Note
Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable!
The game process logic is outlined below:
(1) Capture an image through the camera.
(2) After performing a demonstration action, the car will execute the corresponding action.
(3) From the car’s perspective, lifting the left arm will cause the car to turn left; lifting the right arm will cause the car to turn right in a circle; lifting the left leg will make the car move forward a certain distance; lifting the right leg will make the car move backward a certain distance.
The program’s logic flowchart, obtained from the program files, is presented below:
(1) Initialization function (init(self.name)) defines relevant parameters, including:
① Definition of the image tool (self.drawing) object.
② Points used to draw recognized features.
③ Definition of the limb detection object (self.body_detector).
(2) Identified feature points’ output results undergo logical processing for recognition.
(3) Actions are determined and stored based on key point distance conditions.
(4) Finally, the output results are generated, and the car executes corresponding actions.
Function
Main:
329 330 331 332 | def main(): node = BodyControlNode('body_control') rclpy.spin(node) node.destroy_node() |
Used to start the body sensation control node.
get_joint_landmarks:
47 48 49 50 51 52 53 54 55 56 | def get_joint_landmarks(img, landmarks): """ Convert landmarks from medipipe's normalized output to pixel coordinates(将landmarks从medipipe的归一化输出转为像素坐标) :param img: Picture corresponding to pixel coordinate(像素坐标对应的图片) :param landmarks: Normalized keypoint(归一化的关键点) :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) |
Used to convert the recognized information into pixel coordinates.
joint_distance:
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | def joint_distance(landmarks): distance_list = [] d1 = landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER] d2 = landmarks[LEFT_HIP] - landmarks[LEFT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER] d2 = landmarks[RIGHT_HIP] - landmarks[RIGHT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[LEFT_HIP] - landmarks[LEFT_ANKLE] d2 = landmarks[LEFT_ANKLE] - landmarks[LEFT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_ANKLE] d2 = landmarks[RIGHT_ANKLE] - landmarks[RIGHT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) return distance_list |
Used to calculate the distance between each joint point based on pixel coordinates.
Class
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | class BodyControlNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.running = True self.fps = fps.FPS() # FPS calculator(fps计算器) signal.signal(signal.SIGINT, self.shutdown) self.move_finish = True self.stop_flag = False self.left_hand_count = [] self.right_hand_count = [] self.left_leg_count = [] self.right_leg_count = [] self.detect_status = [0, 0, 0, 0] |
This class is the body control node.
Init:
88 89 90 91 92 93 94 95 96 97 98 99 | def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.running = True self.fps = fps.FPS() # FPS calculator(fps计算器) signal.signal(signal.SIGINT, self.shutdown) |
Initialize the parameters required for body control, read the image callback node from the camera, initialize nodes such as servos, chassis, buzzers, motors, etc., and finally start the main function within the class.
get_node_state:
134 135 136 | def get_node_state(self, request, response): response.success = True return response |
Set the initialization state of the current node.
shutdown:
138 139 | def shutdown(self, signum, frame): self.running = False |
Program exit callback function used to terminate recognition.
image_callback:
141 142 143 144 145 146 147 148 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # Discard the oldest image if the queue is full(如果队列已满,丢弃最旧的图像) self.image_queue.get() # Put the image into the queue(将图像放入队列) self.image_queue.put(rgb_image) |
Image node callback function used to process images and enqueue them.
Move:
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | def move(self, *args): if args[0].angular.z == 1: set_servo_position(self.joints_pub, 0.1, ((9, 650), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = -1.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) time.sleep(11) set_servo_position(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) elif args[0].angular.z == -1: set_servo_position(self.joints_pub, 0.1, ((9, 350), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 1.0 motor2 = MotorState() motor2.id = 4 motor2.rps = -0.1 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) time.sleep(12) set_servo_position(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) else: self.mecanum_pub.publish(args[0]) time.sleep(args[1]) self.mecanum_pub.publish(Twist()) time.sleep(0.1) self.stop_flag =True self.move_finish = True |
Movement strategy function that moves the vehicle according to the recognized limb actions.
buzzer_warn:
205 206 207 208 209 210 211 | def buzzer_warn(self): msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) |
Buzzer control function used for buzzer alarms.
image_proc:
213 214 215 216 217 218 219 220 221 222 223 224 225 | def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks is not None: if self.move_finish: twist = Twist() landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) distance_list = (joint_distance(landmarks)) if distance_list[0] < 1: self.detect_status[0] = 1 if distance_list[1] < 1: self.detect_status[1] = 1 |
Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information, and then performs movements according to the recognized posture.
Main:
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # Press q or esc to exit(按q或者esc退出) self.mecanum_pub.publish(Twist()) self.running = False |
The main function within the BodyControlNode class, used to input image information into the recognition function and display the returned image.
6.3.10 Human Body Tracking
Note
This game is best suited for indoor environments. Outdoor settings may significantly interfere with its effectiveness!
Utilize the yolov5 framework to import a pre-trained human pose model for detecting human bodies. The center point of the detected human body will be indicated in the returned image. When a human body approaches, the robot will retreat; conversely, if the human body is distant, the robot will move forward. This ensures that the distance between the human body and the robot remains approximately 3 meters at all times.
Program Logic
First, import the human pose estimation model from yolov5, and subscribe to topic messages to obtain real-time camera images. Next, utilize the trained model to detect key points of the human body in the images, and calculate the coordinates of the human body’s center point based on all detected key points. Finally, update the PID controller based on the coordinates of the human body’s center point and the screen’s center point to control the robot’s movement in sync with the human body’s movement.
Operation Steps
Note
When entering commands, strict case sensitivity is required, and you can use the “Tab” key to complete keywords.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Enter the command and press Enter
ros2 launch example body_track.launch.py
(5) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(6) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
Upon initiating the game, the camera captures the human body within its field of view. Once detected, the center point of the human body is highlighted in the displayed image.
From the robot’s perspective, if the human body is in close proximity, the robot will retreat. Conversely, if the human body is distant, the robot will move forward, ensuring that the distance between the human body and the robot remains approximately 3 meters at all times.
Program Analysis
The program file is saved in:ros2_ws/src/example/example/body_control/include/body_track.py
Note
Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable!
Based on the game’s effectiveness, the procedural logic is delineated as follows:
(1) The car captures images through the camera.
(2) It identifies the human body’s position in the image and calculates the distance between the human body and the car.
(3) Finally, the car adjusts its movement to follow within the preset distance limit.
The program’s logic flowchart, derived from the program files, is illustrated below:
Initialization Function of the BodyControlNode Class:
Configuration of Linear and Angular Speed: Adjusts the car’s speed based on the detected distance from the human body.
Definition of Chassis Publisher: Publishes the chassis’s position and orientation to control the car’s forward and backward movement.
Yolov5 Human Body Detection: Utilizes the yolov5 single-target detection algorithm to detect human bodies and obtain their positions.
PID Parameter Initialization: Initializes parameters related to the PID controller, governing sensitivity to the distance between the car and the human body.
Function
Main:
184 185 186 187 | def main(): node = BodyControlNode('body_control') rclpy.spin(node) node.destroy_node() |
Used to start the human tracking node.
Class
29 30 31 32 33 34 35 36 37 38 39 40 41 | class BodyControlNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.pid_d = pid.PID(0.1, 0, 0) #self.pid_d = pid.PID(0, 0, 0) self.pid_angular = pid.PID(0.002, 0, 0) #self.pid_angular = pid.PID(0, 0, 0) self.go_speed, self.turn_speed = 0.007, 0.04 |
This class is the human tracking node.
Init:
31 32 33 34 35 36 37 38 39 40 41 | rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.pid_d = pid.PID(0.1, 0, 0) #self.pid_d = pid.PID(0, 0, 0) self.pid_angular = pid.PID(0.002, 0, 0) #self.pid_angular = pid.PID(0, 0, 0) self.go_speed, self.turn_speed = 0.007, 0.04 |
Initialize the parameters required for human tracking, read nodes such as the camera’s image callback, depth information, chassis, YOLOv5 recognition, etc., and then synchronize the time to align depth information and image information, finally start the main function within the class.
get_node_state:
80 81 82 | def get_node_state(self, request, response): response.success = True return response |
Set the initialization state of the current node.
shutdown:
84 85 | def shutdown(self, signum, frame): self.running = False |
Program exit callback function used to terminate recognition.
get_object_callback:
87 88 89 90 91 92 93 94 95 | # Get the results of target detection(获取目标检测结果) def get_object_callback(self, msg): for i in msg.objects: class_name = i.class_name if class_name == 'person': if i.box[1] < 10: self.center = [int((i.box[0] + i.box[2])/2), int(i.box[1]) + abs(int((i.box[1] - i.box[3])/4))] else: self.center = [int((i.box[0] + i.box[2])/2), int(i.box[1]) + abs(int((i.box[1] - i.box[3])/3))] |
Callback function for YOLOv5 recognition node, which converts recognition information into pixel coordinates.
image_proc:
109 110 111 112 113 114 115 116 117 118 119 120 121 122 | def image_proc(self, rgb_image): twist = Twist() if self.center is not None: h, w = rgb_image.shape[:-1] cv2.circle(rgb_image, tuple(self.center), 10, (0, 255, 255), -1) ################# roi_h, roi_w = 5, 5 w_1 = self.center[0] - roi_w w_2 = self.center[0] + roi_w if w_1 < 0: w_1 = 0 if w_2 > w: w_2 = w h_1 = self.center[1] - roi_h |
Function for tracking humans, which invokes the model to draw the position of the person based on the recognized information, and uses PID to control the movement according to the recognized person.
Main:
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(image) except BaseException as e: result_image = image.copy() self.get_logger().info('\033[1;32m%s\033[0m' % e) self.center = None cv2.imshow(self.name, cv2.cvtColor(result_image, cv2.COLOR_RGB2BGR)) key = cv2.waitKey(1) if key == ord('q') or key == 27: # Press q or esc to exit(按q或者esc退出) self.mecanum_pub.publish(Twist()) self.running = False |
The main function within the BodyControlNode class, used to input image information into the recognition function and display the returned image.
Function Expansion
The default tracking speed in the program is set to a fixed value. If you wish to alter the tracking speed of the robot, you can do so by adjusting the PID parameters within the program.
(1) Open the terminal, enter the following command to navigate to the directory where the program is stored:
cd /home/ubuntu/ros2_ws/src/example/example/body_control/include/
(2) Run the following command to open the program file:
vim body_track.py
(3) Locate the self.pid_d and self.pid_angular functions, where the values inside the parentheses are the PID-related parameters. One is for the tracking linear velocity PID, and the other is for the tracking angular velocity PID.
The three PID parameters are proportional, integral, and derivative. The proportional parameter adjusts the response level, the integral parameter adjusts the smoothness, and the derivative parameter adjusts whether there is overshoot.
(4) Press “i” to enter edit mode. If you want to increase the robot’s tracking speed, you can correspondingly increase the value. For example, here we set the tracking linear velocity PID to 0.05.
Note
It is recommended not to adjust the parameters too high. Excessive parameters will cause the robot to track too quickly, affecting the experience.
(5) After completing the modifications, press “Esc” to exit edit mode, then press “:wq” to save and exit.
(6) Start the game according to the instructions provided in section 6.3.10.
6.3.11 Integration of Body Posture and RGB Control
The depth camera combines RGB capabilities, enabling both color recognition and somatosensory control. This lesson will leverage color recognition, as discussed in 6.3.9 Posture Control, to identify individuals wearing clothing of a predefined color (which can be calibrated through operations). The robot’s movements are then controlled based on different body gestures.
If an individual wearing the specified color is not recognized, the robot remains unresponsive, ensuring accurate identification and control over the individual operating the robot.
Program Logic
First, import MediaPipe’s human pose estimation model and subscribe to topic messages to obtain real-time camera footage.
Next, utilize the built model to detect key points of the human torso on the screen. Connect these key points to display the human torso and determine the human posture. Calibrate the center point of the human body based on all key points.
Finally, if it is detected that the human body is standing with hands on hips, calibrate the color of the clothes to identify the control object. The robot enters control mode, and when the human body performs specific actions, the robot responds with corresponding actions.
Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Run the following command and hit Enter key to initiate the game:
ros2 launch example body_and_rgb_control.launch.py
(5) If you need to close this game you need to press the “Esc” key in the image interface to exit the camera image interface.
(6) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
After starting the game, stand within the camera’s field of view. When a person is detected, the screen will display key points of the human torso, lines connecting these points, and the center point of the human body.
First Step: Slightly adjust the camera to maintain a certain distance, ensuring it can detect the entire human body.
Second Step: When the person to be controlled appears on the camera screen, they can assume the posture of hands on hips. If the buzzer emits a short beep, the robot completes calibration of the human body’s center point and clothing color, entering control mode.
Third Step: At this point, with the robot as the primary viewpoint, raising the left arm causes the robot to move to the right; raising the right arm causes the robot to move to the left; raising the left leg causes the robot to move forward; raising the right leg causes the robot to move backward.
Fourth Step: If a person wearing a different clothing color enters the camera’s field of view, they will not control the robot.
Program Analysis
The program file is saved in:
ros2_ws/src/example/example/body_control/include/body_and_rgb_control.py
Note
Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable!
Based on the game’s effectiveness, the procedural logic is delineated as follows:
Retrieve the captured image from the camera, analyze the key characteristics of the human body, initially detect and assess the “akimbo” posture using the designated function. Subsequently, determine if it’s the same individual in the picture based on clothing color. If confirmed, identify specific body movements (raising the left arm, right arm, left leg, and right leg) and instruct the car to execute corresponding actions. Otherwise, reanalyze the key feature points.
The program’s logic flowchart, derived from the program files, is illustrated below:
As shown in the above diagram, the program first uses the initialization function of the BodyControlNode class to set the default values for the relevant parameters. This primarily includes defining the drawing tool object, the limb detection object, and initializing the posture state and count parameters. After this setup, the program can logically process the incoming images. It begins by identifying and outputting key body feature points, calculating joint angles from the obtained (arm) parameters to mark the hands-on-hips posture. Then, it matches colors based on the recognized key feature points to determine if the person has been previously marked. Finally, the program controls the movement of the vehicle by recognizing demonstration actions (raising an arm, lifting a leg).
Function
Main:
422 423 424 425 | def main(): node = BodyControlNode('body_control') rclpy.spin(node) node.destroy_node() |
Used to start the RGB body sensation control node.
get_body_center:
49 50 51 52 | def get_body_center(h, w, landmarks): landmarks = np.array([(lm.x * w, lm.y * h) for lm in landmarks]) center = ((landmarks[LEFT_HIP] + landmarks[LEFT_SHOULDER] + landmarks[RIGHT_HIP] + landmarks[RIGHT_SHOULDER])/4).astype(int) return center.tolist() |
Used to obtain the currently recognized body contours.
get_joint_landmarks:
54 55 56 57 58 59 60 61 62 | def get_joint_landmarks(img, landmarks): """ Convert landmarks from medipipe's normalized output to pixel coordinates)(将landmarks从medipipe的归一化输出转为像素坐标 :param img: Picture corresponding to pixel coordinate(像素坐标对应的图片) :param landmarks: Normalized keypoint(归一化的关键点) """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) |
Used to convert the recognized information into pixel coordinates.
get_dif:
64 65 66 67 68 69 | def get_dif(list1, list2): if len(list1) != len(list2): return 255*3 else: d = np.absolute(np.array(list1) - np.array(list2)) return sum(d) |
Used to compare the color of clothes on the body contours.
joint_angle:
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | def joint_angle(landmarks): """ Calculate flex angle of each joint(计算各个关节弯曲角度) :param landmarks: Hand keypoints(手部关键点) :return: Joint angle(关节角度) """ angle_list = [] left_hand_angle1 = vector_2d_angle(landmarks[LEFT_SHOULDER] - landmarks[LEFT_ELBOW], landmarks[LEFT_WRIST] - landmarks[LEFT_ELBOW]) angle_list.append(int(left_hand_angle1)) left_hand_angle2 = vector_2d_angle(landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER], landmarks[LEFT_WRIST] - landmarks[LEFT_SHOULDER]) angle_list.append(int(left_hand_angle2)) right_hand_angle1 = vector_2d_angle(landmarks[RIGHT_SHOULDER] - landmarks[RIGHT_ELBOW], landmarks[RIGHT_WRIST] - landmarks[RIGHT_ELBOW]) angle_list.append(int(right_hand_angle1)) right_hand_angle2 = vector_2d_angle(landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER], landmarks[RIGHT_WRIST] - landmarks[RIGHT_SHOULDER]) angle_list.append(int(right_hand_angle2)) return angle_list |
This function is used to calculate the recognition angles of various joints between the body parts.
joint_distance:
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | def joint_distance(landmarks): distance_list = [] d1 = landmarks[LEFT_HIP] - landmarks[LEFT_SHOULDER] d2 = landmarks[LEFT_HIP] - landmarks[LEFT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_SHOULDER] d2 = landmarks[RIGHT_HIP] - landmarks[RIGHT_WRIST] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[LEFT_HIP] - landmarks[LEFT_ANKLE] d2 = landmarks[LEFT_ANKLE] - landmarks[LEFT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) d1 = landmarks[RIGHT_HIP] - landmarks[RIGHT_ANKLE] d2 = landmarks[RIGHT_ANKLE] - landmarks[RIGHT_KNEE] dis1 = d1[0]**2 + d1[1]**2 dis2 = d2[0]**2 + d2[1]**2 distance_list.append(round(dis1/dis2, 1)) return distance_list |
This function is used to calculate the distance between each joint point based on pixel coordinates.
Class
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | class BodyControlNode(Node): def __init__(self, name): rclpy.init() super().__init__(name) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.color_picker = ColorPicker(Point(), 2) signal.signal(signal.SIGINT, self.shutdown) self.fps = fps.FPS() # fps calculator(fps计算器) self.running = True self.current_color = None self.lock_color = None self.calibrating = False self.move_finish = True self.stop_flag = False self.count_akimbo = 0 self.count_no_akimbo = 0 |
This class is the body control node.
Init:
122 123 124 125 126 127 128 129 130 131 132 133 134 | def __init__(self, name): rclpy.init() super().__init__(name) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.color_picker = ColorPicker(Point(), 2) signal.signal(signal.SIGINT, self.shutdown) self.fps = fps.FPS() # fps calculator(fps计算器) |
Initialize the parameters required for body control, read the camera’s image callback node, initialize nodes such as servos, chassis, buzzers, motors, etc., and finally start the main function within the class.
get_node_state:
176 177 178 | def get_node_state(self, request, response): response.success = True return response |
Set the initialization state of the current node.
shutdown:
180 181 | def shutdown(self, signum, frame): self.running = False |
Program exit callback function used to terminate recognition.
image_callback:
183 184 185 186 187 188 189 190 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # Discard the oldest image if the queue is full(如果队列已满,丢弃最旧的图像) self.image_queue.get() # Put the image into the queue(将图像放入队列) self.image_queue.put(rgb_image) |
Image node callback function used to process images and enqueue them.
Move:
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | def move(self, *args): if args[0].angular.z == 1: set_servo_position(self.joints_pub, 0.1, ((9, 650), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.1 motor2 = MotorState() motor2.id = 4 motor2.rps = -1.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) time.sleep(11) set_servo_position(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) elif args[0].angular.z == -1: set_servo_position(self.joints_pub, 0.1, ((9, 350), )) time.sleep(0.2) motor1 = MotorState() motor1.id = 2 motor1.rps = 1.0 motor2 = MotorState() motor2.id = 4 motor2.rps = -0.1 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) time.sleep(12) set_servo_position(self.joints_pub, 0.1, ((9, 500), )) motor1 = MotorState() motor1.id = 2 motor1.rps = 0.0 motor2 = MotorState() motor2.id = 4 motor2.rps = 0.0 msg = MotorsState() msg.data = [motor1, motor2] self.motor_pub.publish(msg) else: self.mecanum_pub.publish(args[0]) time.sleep(args[1]) self.mecanum_pub.publish(Twist()) time.sleep(0.1) self.stop_flag =True self.move_finish = True |
Movement strategy function that moves the vehicle according to the recognized limb actions.
buzzer_warn:
247 248 249 250 251 252 253 | def buzzer_warn(self): msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) |
Buzzer control function used for buzzer alarms.
image_proc:
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks is not None: twist = Twist() landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) # Hands-on-hips calibration (叉腰标定) angle_list = joint_angle(landmarks) #print(angle_list) if -150 < angle_list[0] < -90 and -30 < angle_list[1] < -10 and 90 < angle_list[2] < 150 and 10 < angle_list[3] < 30: self.count_akimbo += 1 # Hands-on-hips detection+1(叉腰检测+1) self.count_no_akimbo = 0 # Clear no hands-on-hips detection(没有叉腰检测归零) else: self.count_akimbo = 0 # Clear hands-on-hips detection(叉腰检测归零) |
Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information. Then, it performs color recognition based on the position of each limb’s different contours, finally determining and moving according to the recognized posture.
Main:
399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # Press q or esc to exit(按q或者esc退出) self.mecanum_pub.publish(Twist()) self.running = False rclpy.shutdown() |
The main function within the BodyControlNode class, used to input image information into the recognition function and display the returned image.
6.3.12 Pose Detection
Through the human pose estimation model in the MediaPipe machine learning framework, the human body posture is detected. When the robot detects a person falling, it will sound an alarm and sway from side to side.
Program Logic
First, import the human pose estimation model from MediaPipe and subscribe to topic messages to obtain real-time footage from the camera.
Then, process the image, such as flipping, to detect human body information in the image. Based on the lines connecting the key points of the human body, calculate the limb height to determine the body movement.
Finally, if “falling” is detected, the robot will sound an alarm and move forwards and backwards.
Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
(1) Start the robot, and enter the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Run the command to disable app auto-start app service.
sudo systemctl stop start_app_node.service
(4) Run the following command and hit Enter key to initiate the game:
ros2 launch example fall_down_detect.launch.py
(5) If you need to close this game, you need to press the “Esc” key in the image interface to exit the camera image interface.
(6) Then press “Ctrl+C” in the command line terminal interface. If closing fails, please try again.
Program Outcome
Once the game starts, ensure the human body remains as fully within the camera’s field of view as possible. Upon recognizing the human body, the key points will be highlighted in the returned image.
At this point, the individual can sit down briefly. Upon detecting the “falling” posture, the robot will continuously sound an alarm and make repeated forward and backward movements as a reminder.
Program Analysis
The program file is saved in:ros2_ws/src/example/example/body_control/include/fall_down_detect.py
Note
Prior to making any alterations to the program, ensure to create a backup of the original factory program. Modify it only after creating the backup. Directly editing the source code file is prohibited to prevent inadvertent parameter modifications that could render the robot dysfunctional and irreparable!
Based on the game’s effectiveness, the procedural logic is delineated as follows:
The car captures images via the camera, identifies the key feature points of the human body, and assesses whether the current posture indicates a “fall”. If a fall is detected, the car’s buzzer will emit a continuous “beep” sound while the car moves backward. Otherwise, the buzzer will only emit a single “beep” sound.
The program logic flow chart obtained from the program files is depicted in the figure below.
Function
Main:
205 206 207 208 | def main(): node = FallDownDetectNode('fall_down_detect') rclpy.spin(node) node.destroy_node() |
Used to start the body sensation control node.
get_joint_landmarks:
45 46 47 48 49 50 51 52 53 54 | def get_joint_landmarks(img, landmarks): """ Convert landmarks from medipipe's normalized output to pixel coordinates(将landmarks从medipipe的归一化输出转为像素坐标) :param img: Picture corresponding to pixel coordinate(像素坐标对应的图片) :param landmarks: Normalized keypoint(归一化的关键点) :return: """ h, w, _ = img.shape landmarks = [(lm.x * w, lm.y * h) for lm in landmarks] return np.array(landmarks) |
Used to convert the recognized information into pixel coordinates.
height_cal:
56 57 58 59 60 61 62 | def height_cal(landmarks): y = [] for i in landmarks: y.append(i[1]) height = sum(y)/len(y) return height |
Calculates the height of the limbs based on the recognized information.
Class
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | class FallDownDetectNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.running = True self.fps = fps.FPS() # fps calculator(fps计算器) self.fall_down_count = [] self.move_finish = True self.stop_flag = False signal.signal(signal.SIGINT, self.shutdown) self.bridge = CvBridge() self.image_queue = queue.Queue(maxsize=2) |
This class is the fall detection node.
Init:
64 65 66 67 68 69 70 71 72 73 74 75 76 77 | class FallDownDetectNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.drawing = mp.solutions.drawing_utils self.body_detector = mp_pose.Pose( static_image_mode=False, min_tracking_confidence=0.7, min_detection_confidence=0.7) self.running = True self.fps = fps.FPS() # fps calculator(fps计算器) self.fall_down_count = [] |
Initialize the parameters required for body control, read the camera’s image callback node, initialize nodes such as chassis, buzzers, and others, and finally start the main function within the class.
get_node_state:
99 100 101 | def get_node_state(self, request, response): response.success = True return response |
Set the initialization state of the current node.
shutdown:
103 104 | def shutdown(self, signum, frame): self.running = False |
Program exit callback function used to terminate recognition.
image_callback:
106 107 108 109 110 111 112 113 | def image_callback(self, ros_image): cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # Discard the oldest image if the queue is full(如果队列已满,丢弃最旧的图像) self.image_queue.get() # Put the image into the queue(将图像放入队列) self.image_queue.put(rgb_image) |
Image node callback function used to process images and enqueue them.
Move:
115 116 117 118 119 120 121 122 123 124 125 126 127 | def move(self): for i in range(5): twist = Twist() twist.linear.x = 0.2 self.mecanum_pub.publish(twist) time.sleep(0.2) twist = Twist() twist.linear.x = -0.2 self.mecanum_pub.publish(twist) time.sleep(0.2) self.mecanum_pub.publish(Twist()) self.stop_flag =True self.move_finish = True |
Movement strategy function that moves the vehicle according to the recognized limb height.
buzzer_warn:
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | def buzzer_warn(self): if not self.stop_flag: while not self.stop_flag: msg = BuzzerState() msg.freq = 1000 msg.on_time = 0.1 msg.off_time = 0.1 msg.repeat = 1 self.buzzer_pub.publish(msg) time.sleep(0.2) else: msg = BuzzerState() msg.freq = 1900 msg.on_time = 0.2 msg.off_time = 0.01 msg.repeat = 1 self.buzzer_pub.publish(msg) |
Buzzer control function used for buzzer alarms.
image_proc:
147 148 149 150 151 152 153 154 155 156 157 158 159 | def image_proc(self, image): image_flip = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) results = self.body_detector.process(image) if results is not None and results.pose_landmarks: if self.move_finish: landmarks = get_joint_landmarks(image, results.pose_landmarks.landmark) h = height_cal(landmarks) if h > image.shape[:-2][0] - 120: self.fall_down_count.append(1) else: self.fall_down_count.append(0) if len(self.fall_down_count) == 3: count = sum(self.fall_down_count) |
Function for recognizing limbs, which invokes the model to draw key points of the human body based on the recognized information, and moves according to the recognized height.
Main:
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | def main(self): while self.running: try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.running: break else: continue try: result_image = self.image_proc(np.copy(image)) except BaseException as e: self.get_logger().info('\033[1;32m%s\033[0m' % e) result_image = cv2.flip(cv2.cvtColor(image, cv2.COLOR_RGB2BGR), 1) self.fps.update() result_image = self.fps.show_fps(result_image) cv2.imshow(self.name, result_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # Press q or esc to exit(按q或者esc退出) self.mecanum_pub.publish(Twist()) self.running = False |
The main function within the FallDownDetectNode class, used to input image information into the recognition function and display the returned image.
6.4 Autonomous Driving Debugging Lesson
Note
This document pertains to the autonomous driving capabilities of our company’s vehicle. Before engaging the autonomous driving feature, please verify that the hardware components (such as the depth camera and car chassis motors) installed in the vehicle meet the specified requirements and are operating correctly. Additionally, ensure that the battery is adequately charged.
6.4.1 Props Setup & Notices
Prior to commencing autonomous driving, please follow the steps below to set up the map.
Note
Tools required for autonomous driving are available for separate purchase. If you are interested, kindly reach out to us at support@hiwonder.com.
Setup Instructions
(1) Map Setup
To begin, ensure that the site is situated on a flat and spacious ground, and verify that there is ample lighting surrounding the area, as illustrated in the image below:
Place the robot at the start point as pictured.
As the robot advances, it will navigate along the yellow line bordering the map, and it can dynamically adjust its posture based on the lane lines in real-time.
(2) Road Signs Setup
In the game of autonomous driving, a total of 4 waypoints need to be placed: 2 for straight driving, 1 for right turn, and 1 for parking. Please refer to the following image for the specific placement positions:
Note
Ensure accurate placement!!
Go straight: instruct the robot car to go forward.
Turn right: Instruct the robot car to turn right.
Park: Command the robot car to park.
The purpose of the road signs is to guide the car during its journey. Once the car recognizes a road sign, it will perform the appropriate actions. Additionally, the car will slow down as it approaches a pedestrian crosswalk.
(3) Traffic Light Setup
In the game of the autonomous driving function, only one traffic light needs to be placed. The exact placement location is illustrated in the figure below:
Traffic light props are utilized to replicate real-world traffic lights on roads. When the car detects the traffic light, it will adhere to the “stop on red light, go on green light” rule.
Once all venue props are arranged, refer to the following picture for setting up the venue environment:
(4) Notice:
Note
Ensure that the site has adequate lighting, preferably natural ambient light. Avoid strong direct light sources and colored lights, as they may hinder overall recognition. Pay attention to this aspect when setting up the site.
While laying out and using the equipment, ensure proper care and protection of the props on the site. In case of incomplete maps or signs, or if the traffic light is damaged and cannot function properly, promptly contact our after-sales personnel to purchase replacements to prevent any disruptions to the recognition process.
6.4.2 Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
(1) Place the robot at the starting point of the map, and proceed clockwise while recognizing the road signs.
(2) Start the robot, and access the robot system desktop using NoMachine.
(3) Click-on
to open the command-line terminal.
(4) Execute the command and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(5) Run the command to initiate autonomous driving service.
roslaunch hiwonder_example self_driving.launch
The printed message below indicates a successful startup.
Program Outcome
(1) Lane Keeping
After initiating the game, the car will follow the line patrol and detect the yellow line along the roadside. Depending on whether the yellow line is straight or curved, the car will execute forward movement and turning actions to maintain its position within the lane.
(2) Traffic Light Recognition
When the car encounters a traffic light, it will halt if the light is red and proceed if the light is green. While crossing a pedestrian zebra crossing, the car will automatically decelerate upon detection and move forward cautiously.
(3) Turning and Parking
When the car detects a traffic sign ahead while moving forward, it will respond accordingly to the sign. If it indicates a right turn, the car will turn right and continue forward. If it indicates parking, the car will park sideways.
Once you have experienced this functionality, execute the command or restart the robot to activate the app service.
Run the command to restart the app service. When the buzzer emits a beeping sound, it indicates that the service has been successfully enabled.
sudo systemctl start start_app_node.service
6.4.3 Recognition & Debugging
When the car is driving on the road, it might encounter instances of poor recognition, leading to incorrect actions. In such cases, it’s essential to adjust the car’s parameters. For specific debugging methods, please consult the following content.
Adjust Camera Angle
When using the autonomous driving function, ensure to check the camera angle on the vehicle and confirm its correctness. Refer to the figure below for the correct position (note: the camera angle has been tested and adjusted for optimal performance. Incorrect camera angle may affect the effectiveness of driverless driving. If the camera angle is incorrect, it can be adjusted by correcting the deviation of the robot arm).
Color Threshold Adjustment
If the car deviates (veers off straight) during driving, you can correct it by adjusting the color threshold. The specific adjustment method is as follows:
(1) Execute the command and hit Enter.
sudo systemctl stop start_app_node.service
(2) Run the command and hit Enter to enable the camera service.
roslaunch hiwonder_peripherals depth_cam.launch
(3) Create a new terminal, and type the command to open the color adjustment tool.
python3 software/lab_tool/main.py
(4) Select ‘yellow’.
(5) Adjust the three LAB colors on the left side to enhance yellow identification, as illustrated in the figure below:
(6) Click-on ‘Save’ button.
Note
It is recommended not to adjust the color threshold for only one section of the road during the adjustment process. Place the car on various sections of the road it will traverse during operation. By adjusting the color thresholds in multiple locations, the results will be more effective.
Road Sign and Traffic Light Identification and Debugging
If the car fails to execute a corresponding action after recognizing a sign or a traffic light, the confidence level in the launch file can be adjusted accordingly. The specific method is as follows:
(1) Create a terminal, and type the command then hit Enter to navigate to the self_driving folder.
roscd hiwonder_example/scripts/self_driving/
(2) Execute the command and hit Enter key.
vim self_driving.launch
(3) Locate conf_thresh parameter, and modify the value behind it as pictured.
(4) Adjust the conf_thresh value by slightly lowering it within the range of 0.1 to 0.2, as depicted in the figure below. Set the conf_thresh value to 0.65 (modification method: after selecting the value with the left mouse button, press ESC, then press ‘i’ to enter text mode and modify it to the desired value. For more detailed editing methods, you can search the Internet using the keyword “vim editor to modify text”).
(5) After making the modification, press “ESC” in the interface. Then, hold down “Shift” + “:”, followed by typing “wq” and pressing Enter to exit the interface. Subsequently, restart the program according to the game instructions to observe the effect. If the desired effect is not achieved, you can adjust the value accordingly. If the result is still unsatisfactory, please refer to point “(6)” for further adjustments.
(6) When adjusting the conf_thresh value doesn’t yield satisfactory results, you may need to retrain the YOLOv5 model. For specific training methods, please refer to: 6.2 Machine Learning Application to train the model. Once training is completed, load the model path into the launch file, and then restart the game.
Program Outcome
During the car’s operation, besides controlling its trajectory by adjusting the color threshold (6.4.3), there are also parameters within the program to regulate the car’s movement. If the car continues to exhibit deflection and jitter even after adjusting the color threshold, please refer to the following instructions:
Locate and open the self_driving.py file, located at hiwonder_example\scripts\self_driving\self_driving.py.
Going straight parameter adjustment:
Locate the lane_x parameter in the document. You can search for it using the software you are using, as indicated by the red box in the figure below:
The lane_x parameter represents the central point of the lane line and is crucial for maintaining straight lane alignment. This parameter is preset by our company, and typically does not require adjustment. However, if you need to fine-tune the effect, you can adjust it within a range of ±3 values.
For adjusting the parameter related to right turns:
Search for the following code section in the file:
The lane_x parameter identifies the central point of the lane line. When this position exceeds 150, the car will execute a right turn. Users can control the turning position of the car by adjusting this value appropriately, within a range of ±20. This adjustment depends on the camera angle you’ve configured. By default, our company sets this parameter, and typically, there is no need to change it.
Note
Additional supplementary content can be found in the 6.5 Autonomous Driving section.
6.5 Autonomous Driving
6.5.1 Lane Keeping
This lesson focuses on controlling the car to move forward and maintain lane alignment through instructions.
Preparation
Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to 6.4 Autonomous Driving Debugging Lesson in the same directory as this section for guidance. (In this lesson, we are only experiencing the road driving-line patrol function, so there is no need to place props such as traffic lights and signboards.)
When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition.
It is essential to adjust the color threshold beforehand and set the color threshold of the yellow line to avoid misidentification during subsequent recognition. For specific color threshold adjustment, please refer to the 5. ROS+OpenCV Course for reference.
It is recommended to position the car in the middle of the road for easy identification!
Program Logic
Lane keeping involves three main steps: capturing real-time images, processing the images, and comparing the results.
First, we use the camera to capture real-time images of the scene.
Next, we process these images through color recognition, converting them to different color spaces, and applying techniques such as erosion, dilation, and binarization.
In the result comparison phase, we define a Region of Interest (ROI), outline the contours of the processed image, and perform calculations to compare the results.
Finally, based on these comparisons, the robot adjusts its direction to stay centered within the lane.
You can find the source code for this program at:/home/ubuntu/ros2_ws/src/example/example/self_driving/lane_detect.py
Operation Steps
Note
The input command should be case sensitive, and keywords can be complemented using Tab key.
Start the robot, and access the robot system desktop using NoMachine.
Enable the model:
(1) Click-on
to start the command-line terminal.
(2) Execute the command and hit Enter to disable the app auto-start service.
sudo systemctl stop start_app_node.service
(3) Run the following command and hit Enter key.
ros2 launch example self_driving.launch.py only_line_follow:=true
(4) To close the program, click on the terminal window where it’s running and press ‘Ctrl+C’.
Program Outcome
After starting the game, place the robot on the road, and it will automatically detect the yellow line at the edge of the road. The robot will then adjust its position based on the detection results.
Program Analysis
The source code of this program is saved in: ros2_ws/src/example/example/self_driving/lane_detect.py
Function:
image_callback:
173 174 175 176 177 178 179 180 | def image_callback(ros_image): cv_image = bridge.imgmsg_to_cv2(ros_image, "bgr8") bgr_image = np.array(cv_image, dtype=np.uint8) if image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) image_queue.get() # 将图像放入队列(put the image into the queue) image_queue.put(bgr_image) |
Image callback function is used to read the camera node.
Class:
LaneDetector:
19 20 21 22 23 24 25 26 27 28 | class LaneDetector(object): def __init__(self, color): # 车道线颜色(lane color) self.target_color = color # 车道线识别的区域(the recognized area of lane) if os.environ['DEPTH_CAMERA_TYPE'] == 'Dabai': self.rois = ((338, 360, 0, 320, 0.7), (292, 315, 0, 320, 0.2), (248, 270, 0, 320, 0.1)) else: self.rois = ((450, 480, 0, 320, 0.7), (390, 480, 0, 320, 0.2), (330, 480, 0, 320, 0.1)) self.weight_sum = 1.0 |
Init:
24 25 26 27 28 | if os.environ['DEPTH_CAMERA_TYPE'] == 'Dabai': self.rois = ((338, 360, 0, 320, 0.7), (292, 315, 0, 320, 0.2), (248, 270, 0, 320, 0.1)) else: self.rois = ((450, 480, 0, 320, 0.7), (390, 480, 0, 320, 0.2), (330, 480, 0, 320, 0.1)) self.weight_sum = 1.0 |
Initialize the required parameters and set the ROI to lock the recognition range.
set_roi:
30 31 | def set_roi(self, roi): self.rois = roi |
Used to set the Region of Interest (ROI) for recognition.
get_area_max_contour:
34 35 36 37 38 39 40 41 42 43 44 45 46 | def get_area_max_contour(contours, threshold=100): ''' (retrieve the contour corresponding to the largest area) :param contours: :param threshold: :return: ''' contour_area = zip(contours, tuple(map(lambda c: math.fabs(cv2.contourArea(c)), contours))) contour_area = tuple(filter(lambda c_a: c_a[1] > threshold, contour_area)) if len(contour_area) > 0: max_c_a = max(contour_area, key=lambda c_a: c_a[1]) return max_c_a return None |
Obtains the contour with the maximum area from the list of contours obtained through OpenCV.
add_horizontal_line:
48 49 50 51 52 53 54 55 56 57 58 59 | def add_horizontal_line(self, image): # |____ ---> |———— ---> —— h, w = image.shape[:2] roi_w_min = int(w/2) roi_w_max = w roi_h_min = 0 roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] # 截取右半边(capture the right half) flip_binary = cv2.flip(roi, 0) # 上下翻转(flip vertically) max_y = cv2.minMaxLoc(flip_binary)[-1][1] # 提取最上,最左数值为255的点坐标(extract coordinates of points with the highest and leftmost values of 255) return h - max_y |
Adds a horizontal recognition line based on the width and height of the frame and the ROI settings.
add_vertical_line_far:
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | def add_vertical_line_far(self, image): h, w = image.shape[:2] roi_w_min = int(w/8) roi_w_max = int(w/2) roi_h_min = 0 roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] flip_binary = cv2.flip(roi, -1) # 图像左右上下翻转(flip the image horizontally and vertically) #cv2.imshow('1', flip_binary) # min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(ret) # minVal:最小值(the minimum value) # maxVal:最大值(the maximal value) # minLoc:最小值的位置(the position of minimum value) # maxLoc:最大值的位置(the position of the maximal value) # 遍历的顺序,先行再列,行从左到右,列从上到下(the traversal order is row-major, iterating through rows from left to right and columns from top to bottom) (x_0, y_0) = cv2.minMaxLoc(flip_binary)[-1] # 提取最上,最左数值为255的点坐标(extract coordinates of points with the highest and leftmost values of 255) y_center = y_0 + 55 roi = flip_binary[y_center:, :] (x_1, y_1) = cv2.minMaxLoc(roi)[-1] down_p = (roi_w_max - x_1, roi_h_max - (y_1 + y_center)) y_center = y_0 + 65 roi = flip_binary[y_center:, :] (x_2, y_2) = cv2.minMaxLoc(roi)[-1] up_p = (roi_w_max - x_2, roi_h_max - (y_2 + y_center)) up_point = (0, 0) down_point = (0, 0) if up_p[1] - down_p[1] != 0 and up_p[0] - down_p[0] != 0: up_point = (int(-down_p[1]/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), 0) down_point = (int((h - down_p[1])/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), h) return up_point, down_point |
Adds a recognition vertical line for the part of the frame farther from the robot based on the ROI settings.
get_binary:
123 124 125 126 127 128 129 130 131 | def get_binary(self, image): # 通过lab空间识别颜色(recognize color through lab space) img_lab = cv2.cvtColor(image, cv2.COLOR_RGB2LAB) # rgb转lab(convert rgb to lab) img_blur = cv2.GaussianBlur(img_lab, (3, 3), 3) # 高斯模糊去噪(Gaussian blur for noise reduction) mask = cv2.inRange(img_blur, tuple(lab_data['lab']['Stereo'][self.target_color]['min']), tuple(lab_data['lab']['Stereo'][self.target_color]['max'])) # 二值化(binarization) eroded = cv2.erode(mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 腐蚀(corrosion) dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) # 膨胀(dilation) return dilated |
Performs color recognition based on the color space and processes the binarized image.
add_vertical_line_near:
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | def add_vertical_line_near(self, image): # ——| |—— | # | ---> | ---> h, w = image.shape[:2] roi_w_min = 0 roi_w_max = int(w/2) roi_h_min = int(h/2) roi_h_max = h roi = image[roi_h_min:roi_h_max, roi_w_min:roi_w_max] flip_binary = cv2.flip(roi, -1) # 图像左右上下翻转(flip the image horizontally and vertically) #cv2.imshow('1', flip_binary) (x_0, y_0) = cv2.minMaxLoc(flip_binary)[-1] # 提取最上,最左数值为255的点坐标(extract coordinates of points with the highest and leftmost values of 255) down_p = (roi_w_max - x_0, roi_h_max - y_0) (x_1, y_1) = cv2.minMaxLoc(roi)[-1] y_center = int((roi_h_max - roi_h_min - y_1 + y_0)/2) roi = flip_binary[y_center:, :] (x, y) = cv2.minMaxLoc(roi)[-1] up_p = (roi_w_max - x, roi_h_max - (y + y_center)) up_point = (0, 0) down_point = (0, 0) if up_p[1] - down_p[1] != 0 and up_p[0] - down_p[0] != 0: up_point = (int(-down_p[1]/((up_p[1] - down_p[1])/(up_p[0] - down_p[0])) + down_p[0]), 0) down_point = down_p return up_point, down_point, y_center |
Adds a recognition vertical line for the part of the frame closer to the robot based on the ROI and frame width and height settings.
__call__:
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | def __call__(self, image, result_image): # 按比重提取线中心(extracting line centers based on weight) centroid_sum = 0 h, w = image.shape[:2] max_center_x = -1 center_x = [] for roi in self.rois: blob = image[roi[0]:roi[1], roi[2]:roi[3]] # 截取roi(crop roi) contours = cv2.findContours(blob, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1)[-2] # 找轮廓(find contour) max_contour_area = self.get_area_max_contour(contours, 30) # 获取最大面积对应轮廓(retrieve the contour with the largest area) if max_contour_area is not None: rect = cv2.minAreaRect(max_contour_area[0]) # 最小外接矩形(the minimum bounding rectangle) box = np.intp(cv2.boxPoints(rect)) # 四个角(four corners) for j in range(4): box[j, 1] = box[j, 1] + roi[0] cv2.drawContours(result_image, [box], -1, (255, 255, 0), 2) # 画出四个点组成的矩形(draw a rectangle formed by four points) # 获取矩形对角点(retrieve the diagonal points of the rectangle) pt1_x, pt1_y = box[0, 0], box[0, 1] pt3_x, pt3_y = box[2, 0], box[2, 1] # 线的中心点(the center point of line) line_center_x, line_center_y = (pt1_x + pt3_x) / 2, (pt1_y + pt3_y) / 2 cv2.circle(result_image, (int(line_center_x), int(line_center_y)), 5, (0, 0, 255), -1) # 画出中心点(draw center point) center_x.append(line_center_x) else: center_x.append(-1) for i in range(len(center_x)): if center_x[i] != -1: if center_x[i] > max_center_x: max_center_x = center_x[i] centroid_sum += center_x[i] * self.rois[i][-1] if centroid_sum == 0: return result_image, None, max_center_x center_pos = centroid_sum / self.weight_sum # 按比重计算中心点(calculate the centroid based on weight) angle = math.degrees(-math.atan((center_pos - (w / 2.0)) / (h / 2.0))) return result_image, angle, max_center_x |
Callback function for the entire class. Performs color recognition here, draws the recognized yellow lines using OpenCV, and then outputs the image, angle, and pixel coordinates X of each ROI recognition contour.
6.5.2 Road Sign Detection
Preparation
Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to 6.4 Autonomous Driving Debugging in the same directory as this section for guidance.
The road sign model used in this section is trained with YOLOv5. For more details on YOLOv5, please refer to 6.2 Machine Learning Application.
When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition.
Program Logic
Firstly, acquire the real-time image from the camera and apply operations such as erosion and dilation.
Next, invoke the YOLOv5 model and compare it with the target screen image.
Finally, execute the appropriate landmark action based on the comparison results.
You can find the source code for this program at:
/home/ubuntu/ros2_ws/src/example/example/yolov5_detect/yolov5_trt.py
Operation Steps
Note
The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to “Integrated Application” within the same file.
Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the “Tab” key to autocomplete keywords.
(1) Start the robot, and access the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Execute the command, and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(4) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/yolov5_detect
(5) Type the command to open the program source code.
vim yolov5_trt.py
(6) Press the ‘i’ key to enter edit mode. Find the code highlighted in the red box, uncomment line 403, and comment out line 404. Then, go to line 415 and replace ‘./yolov5s.engine’ with ‘traffic_signs_640s_7_0.engine’. After making these changes, press ‘ESC’, type ‘:wq’, and press Enter to save and exit.
(7) Run the command to initiate game program.
python3 yolov5_trt.py
(8) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press ‘Ctrl+C’.
Note
In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable.
(1) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/self_driving
(2) Enter the command to access the game program.
vim self_driving.launch.py
The red box represents the confidence value, which can be adjusted to modify the effectiveness of target detection.
Program Outcome
After initiating the game, place the robot on the road within the map. Once the robot identifies landmarks, it will highlight the detected landmarks and annotate them based on the highest confidence level learned from the model.
6.5.3 Traffic Light Recognition
Preparation
(1) Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to 6.4 Autonomous Driving Debugging Lesson in the same directory as this section for guidance.
(2) The road sign model used in this section is trained with YOLOv5. For more details on YOLOv5, please refer to 6.2 Machine Learning Application.
(3) When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition.
Program Logic
Firstly, capture a real-time image from the camera and apply operations such as erosion and dilation.
Next, invoke the YOLOv5 model to compare it with the target screen image.
Finally, execute corresponding landmark actions based on the comparison results.
The source code for this program can be found at: /home/ubuntu/ros2_ws/src/example/example/yolov5_detect/yolov5_trt.py
Operation Steps
Note
The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to “Integrated Application” within the same file.
Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the “Tab” key to autocomplete keywords.
(1) Start the robot, and access the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Execute the command, and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(4) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/yolov5_detect
(5) Type the command to open the program source code.
vim yolov5_trt.py
(6) Press ‘i’ to enter edit mode. Locate the code highlighted in the red box, uncomment line 403, and comment out line 404. Next, go to line 415 and replace ‘./yolov5s.engine’ with ‘traffic_signs_640s_7_0.engine’. After making these changes, press ‘ESC’, type ‘:wq’, and press Enter to save and exit.
(7) Run the command to initiate game program.
python3 yolov5_trt.py
(8) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press ‘Ctrl+C’.
Note
In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable.
(1) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/self_driving
(2) Enter the command to access the game program.
vim self_driving.launch.py
The red box represents the confidence value, which can be adjusted to modify the effectiveness of target detection.
Program Outcome
After initiating the game, position the robot on the road depicted on the map. Upon recognizing the traffic signal, the robot will assess the color of the signal light and identify frames corresponding to red and green signal lights accordingly.
6.5.4 Turning Decision Making
Preparation
(1) Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to 6.4 Autonomous Driving Debugging Lesson in the same directory as this section for guidance.
(2) The roadmap model discussed in this section is trained using YOLOv5. For further information on YOLOv5 and related content, please refer to 6.2 Machine Learning Application.
(3) When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition.
Program Logic
Firstly, capture the real-time image from the camera and apply operations such as erosion and dilation.
Next, invoke the YOLOv5 model to compare the obtained image with the target screen image.
Finally, based on the comparison outcomes, recognize the steering sign and direct the car accordingly.
The source code of this program is saved in: /home/ubuntu/ros2_ws/src/example/example/yolov5_detect/yolov5_trt.py
Operation Steps
Note
The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to “Integrated Application” within the same file.
Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the “Tab” key to autocomplete keywords.
(1) Start the robot, and access the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Execute the command, and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(4) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/yolov5_detect
(5) Type the command to open the program source code.
vim yolov5_trt.py
(6) Press ‘i’ to enter edit mode. Find the code highlighted in the red box, uncomment the list on line 403, and comment out the list on line 404. Next, go to line 415 and replace ‘./yolov5s.engine’ with ‘traffic_signs_640s_7_0.engine’. When finished, press ‘ESC’, type ‘:wq’, and press Enter to save and exit.
(7) Run the command to initiate game program.
python3 yolov5_trt.py
(8) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press ‘Ctrl+C’.
Note
In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable.
(1) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/self_driving
(2) Enter the command to access the game program.
vim self_driving.launch.py
The red box represents the confidence value, which can be adjusted to modify the effectiveness of target detection.
Program Outcome
Once the game begins, position the robot onto the road within the map. As the robot approaches a turning road sign, it will adjust its direction in accordance with the instructions provided by the sign.
6.5.5 Autonomous Parking
Preparation
(1) Before starting, ensure the map is laid out flat and free of wrinkles, with smooth roads and no obstacles. For specific map laying instructions, please refer to 6.4 Autonomous Driving Debugging Lesson in the same directory as this section for guidance.
(2) The roadmap model discussed in this section is trained using YOLOv5. For further information on YOLOv5 and related content, please refer to 6.2 Machine Learning Application.
(3) When experiencing this game, ensure it is conducted in a well-lit environment, but avoid direct light shining on the camera to prevent misrecognition.
Program Logic
Begin by capturing the real-time image from the camera and applying operations such as erosion and dilation.
Next, invoke the YOLOv5 model to compare the obtained image with the target screen image.
Finally, based on the comparison results, identify the parking road sign and autonomously guide the car to park in the designated parking space.
The source code of this program is saved in: /home/ubuntu/ros2_ws/src/example/example/yolov5_detect/yolov5_trt.py
Operation Steps
Note
The following steps exclusively activate road sign detection in the return screen, without executing associated actions. Users seeking direct experience with autonomous driving may bypass this lesson and proceed to “Integrated Application” within the same file.
Please make sure to enter commands with strict attention to capitalization, spacing, and you can use the “Tab” key to autocomplete keywords.
(1) Start the robot, and access the robot system desktop using NoMachine.
(2) Click-on
to open the command-line terminal.
(3) Execute the command, and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(4) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/yolov5_detect
(5) Type the command to open the program source code.
vim yolov5_trt.py
(6) Press ‘i’ to enter edit mode. Find the code highlighted in the red box, uncomment the list on line 403, and comment out the list on line 404. Then, go to line 415 and replace ‘./yolov5s.engine’ with ‘traffic_signs_640s_7_0.engine’. After making these changes, press ‘ESC’, type ‘:wq’, and press Enter to save and exit.
(7) Run the command to initiate game program.
python3 yolov5_trt.py
(8) Position the road sign in front of the camera, and the robot will recognize the road sign automatically. If you need to terminate this game, press ‘Ctrl+C’.
Note
In the event that the model struggles to recognize traffic-related signs, it may be necessary to lower the confidence level. Conversely, if the model consistently misidentifies traffic-related signs, raising the confidence level might be advisable.
(1) Run the command to navigate to the directory containing programs.
cd /home/ubuntu/ros2_ws/src/example/example/self_driving
(2) Enter the command to access the game program.
vim self_driving.launch.py
The red box represents the confidence value, which can be adjusted to modify the effectiveness of target detection.
Program Outcome
After initiating the game, position the robot on the road within the map. As the robot progresses towards the parking sign, it will automatically park in the designated parking space based on the instructions provided by the road sign.
Parameter Adjustment
If the robot stops upon recognizing the parking sign and the parking position is not optimal, adjustments to the parameters in the program source code can be made.
(1) Click-on
to start the command-line terminal.
(2) Execute the command to navigate to the directory containing game programs.
cd ros2_ws/src/example/example/self_driving/
(3) Run the command to access the source code.
vim self_driving.py
(4) Press the “i” key to enter insert mode and locate the code within the red box. Adjusting the parameters within the red box allows you to control the starting position for the robot to initiate the parking operation. Decreasing the parameters will result in the robot stopping closer to the zebra crossing, while increasing them will cause the robot to stop further away. Once adjustments are made, press the “ESC” key, type “:wq”, and press Enter to save and exit.
You can adjust the parking processing function to alter the parking position of the robot. Initially, the parking action sets the linear speed in the negative direction of the Y-axis (right of the robot) to 0.2 meters per second, with a forward movement time of (0.38/2) seconds. To position the robot in the ideal location on the left side of the parking space, modify the speed and time accordingly.
6.5.6 Integrated Application
This lesson provides instructions for implementing comprehensive driverless game on the robot, covering lane keeping, road sign detection, traffic light recognition, steering decision-making, and self-parking.
Preparation
(1) Map Setup
To ensure accurate navigation, place the map on a flat, smooth surface, free of wrinkles and obstacles. Position all road signs and traffic lights at designated locations on the map, facing clockwise. The starting point and locations of road signs are indicated below:
Note
Tools required for autonomous driving are available for separate purchase. If you are interested, kindly reach out to us at support@hiwonder.com.
(2) Color Threshold Adjustment
Due to variations in light sources, it’s essential to adjust the color thresholds for ‘black, white, red, green, blue, and yellow’ based on the guidelines provided in the 5.ROS+OpenCV Course prior to starting. If the robot encounter inaccurate recognition while moving forward, readjust the color threshold specifically in the map area where recognition fails.
Program Logic
Actions implemented so far include:
(1) Following the yellow line in the outermost circle of the patrol map.
(2) Slowing down and passing if a zebra crossing is detected.
(3) Making a turn upon detection of a turn sign.
(4) Parking the vehicle and entering the parking lot upon detection of a stop sign.
(5) Halting when a red light is detected.
(6) Slowing down when passing a detected street light.
First, load the model file trained by YOLOv5 and the required library files, obtain real-time camera images, and perform operations such as erosion and dilation on the images. Next, identify the target color line segment in the image and gather information such as size and center point of the target image. Then, invoke the model through YOLOv5 and compare it with the target screen image. Finally, adjust the forward direction based on the offset comparison of the target image’s center point to keep the robot in the middle of the road. Additionally, perform corresponding actions based on different recognized landmark information during map traversal.
The source code for this program can be found at: /home/ubuntu/ros2_ws/src/example/example/self_driving/self_driving.py
Operation Steps
Note
The input command should be case sensitive, and keywords can be implemented using Tab key.
(1) Start the robot, and access the robot system desktop using NoMachine.
(2) Click-on
to start the command-line terminal.
(3) Execute the command, and hit Enter to disable the app service.
sudo systemctl stop start_app_node.service
(4) Run the following command and hit Enter key.
ros2 launch example self_driving.launch.py
(5) To close the program, click on the terminal window where it’s running and press ‘Ctrl+C’.
Program Outcome
(1) Lane Keeping
Upon initiating the game, the car will track the line and identify the yellow line at the road’s edge. It will execute forward and turning actions based on the straightness or curvature of the yellow line to maintain lane position.
(2) Traffic Light Recognition
When the car encounters a traffic light, it will halt if the light is red and proceed if it’s green. Upon approaching a zebra crossing, the car will automatically decelerate and proceed cautiously.
(3) Turn and Parking Signs
Upon detecting traffic signs while moving forward, the car will respond accordingly. If it encounters a right turn sign, it will execute a right turn and continue forward. In the case of a parking sign, it will execute a parking maneuver.
Following these rules, the robot will continuously progress forward within the map.
Program Analysis
The source code of this program is saved in: ros2_ws/src/example/example/self_driving/self_driving.py
Function:
411 412 413 414 415 416 | def main(): node = SelfDrivingNode('self_driving') executor = MultiThreadedExecutor() executor.add_node(node) executor.spin() node.destroy_node() |
Initiate autonomous driving class.
Class:
30 31 32 33 34 35 36 37 | class SelfDrivingNode(Node): def __init__(self, name): rclpy.init() super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.is_running = True self.pid = pid.PID(0.01, 0.0, 0.0) self.param_init() |
init:
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True) self.name = name self.is_running = True self.pid = pid.PID(0.01, 0.0, 0.0) self.param_init() self.image_queue = queue.Queue(maxsize=2) self.classes = ['go', 'right', 'park', 'red', 'green', 'crosswalk'] self.display = True self.bridge = CvBridge() self.lock = threading.RLock() self.colors = common.Colors() # signal.signal(signal.SIGINT, self.shutdown) self.machine_type = os.environ.get('MACHINE_TYPE') self.lane_detect = lane_detect.LaneDetector("yellow") self.mecanum_pub = self.create_publisher(Twist, '/controller/cmd_vel', 1) self.joints_pub = self.create_publisher(ServosPosition, '/servo_controller', 1) # 舵机控制(servo control) self.result_publisher = self.create_publisher(Image, '~/image_result', 1) self.create_service(Trigger, '~/enter', self.enter_srv_callback) # 进入玩法(enter game) self.create_service(Trigger, '~/exit', self.exit_srv_callback) # 退出玩法(exit game) self.create_service(SetBool, '~/set_running', self.set_running_srv_callback) # self.heart = Heart(self.name + '/heartbeat', 5, lambda _: self.exit_srv_callback(None)) timer_cb_group = ReentrantCallbackGroup() self.client = self.create_client(Trigger, '/controller_manager/init_finish') self.client.wait_for_service() self.client = self.create_client(Trigger, '/yolov5/init_finish') self.client.wait_for_service() self.start_yolov5_client = self.create_client(Trigger, '/yolov5/start', callback_group=timer_cb_group) self.start_yolov5_client.wait_for_service() self.stop_yolov5_client = self.create_client(Trigger, '/yolov5/stop', callback_group=timer_cb_group) self.stop_yolov5_client.wait_for_service() self.timer = self.create_timer(0.0, self.init_process, callback_group=timer_cb_group) |
Initialize the required parameters, obtain the current robot category, set the line-following color to yellow, start the chassis control, servo control, and image reading. Set up three types of services: enter, exit, and start, and read the YOLOv5 node.
init_process:
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | def init_process(self): self.timer.cancel() self.mecanum_pub.publish(Twist()) if not self.get_parameter('only_line_follow').value: self.send_request(self.start_yolov5_client, Trigger.Request()) if self.machine_type != 'JetRover_Tank': set_servo_position(self.joints_pub, 1, ((10, 500), (5, 500), (4, 250), (3, 0), (2, 750), (1, 500))) # 初始姿态 else: set_servo_position(self.joints_pub, 1, ((10, 500), (5, 500), (4, 230), (3, 0), (2, 750), (1, 500))) # 初始姿态 time.sleep(1) if 1:#self.get_parameter('start').value: self.display = True self.enter_srv_callback(Trigger.Request(), Trigger.Response()) request = SetBool.Request() request.data = True self.set_running_srv_callback(request, SetBool.Response()) #self.park_action() threading.Thread(target=self.main, daemon=True).start() self.create_service(Trigger, '~/init_finish', self.get_node_state) self.get_logger().info('\033[1;32m%s\033[0m' % 'start') |
Initialize the current robotic arm and start the main function.
param_init:
93 94 95 96 97 98 99 100 101 102 103 104 | def param_init(self): self.start = False self.enter = False self.have_turn_right = False self.detect_turn_right = False self.detect_far_lane = False self.park_x = -1 # 停车标识的x像素坐标(the x-coordinate of the parking sign pixel) self.start_turn_time_stamp = 0 self.count_turn = 0 self.start_turn = False # 开始转弯(begin turning) |
Initialize parameters required for position recognition or usage.
get_node_state:
130 131 132 | def get_node_state(self, request, response): response.success = True return response |
Obtain the current state of the node.
send_request:
134 135 136 137 138 | def send_request(self, client, msg): future = client.call_async(msg) while rclpy.ok(): if future.done() and future.result(): return future.result() |
Used to publish service requests.
enter_srv_callback:
140 141 142 143 144 145 146 147 148 149 150 151 | def enter_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "self driving enter") with self.lock: self.start = False camera = 'depth_cam'#self.get_parameter('depth_camera_name').value self.create_subscription(Image, '/%s/rgb/image_raw' % camera, self.image_callback, 1) self.create_subscription(ObjectsInfo, '/yolov5/object_detect', self.get_object_callback, 1) self.mecanum_pub.publish(Twist()) self.enter = True response.success = True response.message = "enter" return response |
Service for entering autonomous driving gameplay, start reading images and YOLOv5 recognition content, initialize speed.
exit_srv_callback:
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | def exit_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "self driving exit") with self.lock: try: if self.image_sub is not None: self.image_sub.unregister() if self.object_sub is not None: self.object_sub.unregister() except Exception as e: self.get_logger().info('\033[1;32m%s\033[0m' % str(e)) self.mecanum_pub.publish(Twist()) self.param_init() response.success = True response.message = "exit" return response |
Service for exiting autonomous driving gameplay, stop reading images and YOLOv5 recognition content, initialize speed, reset parameters.
set_running_srv_callback:
169 170 171 172 173 174 175 176 177 | def set_running_srv_callback(self, request, response): self.get_logger().info('\033[1;32m%s\033[0m' % "set_running") with self.lock: self.start = request.data if not self.start: self.mecanum_pub.publish(Twist()) response.success = True response.message = "set_running" return response |
Start autonomous driving game, set the start parameter to True.
Shutdown:
179 180 | def shutdown(self, signum, frame): # ctrl+c关闭处理(press Ctrl+C to terminate the process) self.is_running = False |
Callback function after closing the program, used to stop the currently running program.
image_callback:
182 183 184 185 186 187 188 189 | def image_callback(self, ros_image): # 目标检查回调(target inspection callback) cv_image = self.bridge.imgmsg_to_cv2(ros_image, "rgb8") rgb_image = np.array(cv_image, dtype=np.uint8) if self.image_queue.full(): # 如果队列已满,丢弃最旧的图像(if the queue is full, discard the oldest image) self.image_queue.get() # 将图像放入队列(put the image to the queue) self.image_queue.put(rgb_image) |
Image callback function, enqueues images and discards expired ones.
park_action:
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | def park_action(self): if self.machine_type == 'JetRover_Mecanum': twist = Twist() twist.linear.y = -0.2 self.mecanum_pub.publish(twist) time.sleep(0.38/0.2) elif self.machine_type == 'JetRover_Acker': twist = Twist() twist.linear.x = 0.15 twist.angular.z = twist.linear.x*math.tan(-0.6)/0.213 self.mecanum_pub.publish(twist) time.sleep(3) twist = Twist() twist.linear.x = 0.15 twist.angular.z = -twist.linear.x*math.tan(-0.6)/0.213 self.mecanum_pub.publish(twist) time.sleep(2) twist = Twist() twist.linear.x = -0.15 twist.angular.z = twist.linear.x*math.tan(-0.6)/0.213 self.mecanum_pub.publish(twist) time.sleep(1.5) set_servo_position(self.joints_pub, 0.1, ((9, 500), )) else: twist = Twist() twist.angular.z = -1 self.mecanum_pub.publish(twist) time.sleep(1.5) self.mecanum_pub.publish(Twist()) twist = Twist() twist.linear.x = 0.2 self.mecanum_pub.publish(twist) time.sleep(0.65/0.2) self.mecanum_pub.publish(Twist()) twist = Twist() twist.angular.z = 1 self.mecanum_pub.publish(twist) time.sleep(1.5) self.mecanum_pub.publish(Twist()) |
Parking logic, runs three different parking strategies according to three different chassis types.
get_object_callback:
383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 | # 获取目标检测结果(get target detected result) def get_object_callback(self, msg): self.objects_info = msg.objects if self.objects_info == []: # 没有识别到时重置变量(reset variables when no detection is made) self.traffic_signs_status = None self.crosswalk_distance = 0 else: min_distance = 0 for i in self.objects_info: class_name = i.class_name center = (int((i.box[0] + i.box[2])/2), int((i.box[1] + i.box[3])/2)) if class_name == 'crosswalk': if center[1] > min_distance: # 获取最近的人行道y轴像素坐标(retrieve the closest pedestrian crossing y-axis pixel coordinate) min_distance = center[1] elif class_name == 'right': # 获取右转标识(retrieve the right turn sign) self.count_right += 1 self.count_right_miss = 0 if self.count_right >= 10: # 检测到多次就将右转标志至真(if detected multiple times, set the right turn sign to true) self.turn_right = True self.count_right = 0 elif class_name == 'park': # 获取停车标识中心坐标(retrieve the center coordinates of the stop sign) self.park_x = center[0] elif class_name == 'red' or class_name == 'green': # 获取红绿灯状态(get the state of traffic light) self.traffic_signs_status = i self.crosswalk_distance = min_distance |
Callback function for reading YOLOv5, obtains the categories currently recognized by YOLOv5.
Main:
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | def main(self): while self.is_running: time_start = time.time() try: image = self.image_queue.get(block=True, timeout=1) except queue.Empty: if not self.is_running: break else: continue result_image = image.copy() if self.start: h, w = image.shape[:2] # 获取车道线的二值化图(obtain the binary image of the lane lines) binary_image = self.lane_detect.get_binary(image) # 检测到斑马线,开启减速标志(zebra crossing detected, activate deceleration sign) if 450 < self.crosswalk_distance and not self.start_slow_down: # 只有足够近时才开始减速(only start decelerating when close enough) self.count_crosswalk += 1 if self.count_crosswalk == 3: # 多次判断,防止误检测(multiple checks to prevent false detection) self.count_crosswalk = 0 self.start_slow_down = True # 减速标识(deceleration sign) self.count_slow_down = time.time() # 减速固定时间(decelerate for a fixed duration) else: # 需要连续检测,否则重置(continuous detection required, otherwise reset) self.count_crosswalk = 0 twist = Twist() # 减速行驶处理(deceleration driving processing) if self.start_slow_down: if self.traffic_signs_status is not None: # 通过面积判断离灯灯远近,如果太近那么即使是红灯也不会停(use area to determine the distance from the traffic light. If too close, even if it's a red light, the vehicle won't stop) area = abs(self.traffic_signs_status.box[0] - self.traffic_signs_status.box[2])*abs(self.traffic_signs_status.box[1] - self.traffic_signs_status.box[3]) if self.traffic_signs_status.class_name == 'red' and area < 1000: # 如果遇到红灯就停车(if encountering a red light, stop the vehicle) self.mecanum_pub.publish(Twist()) self.stop = True elif self.traffic_signs_status.class_name == 'green': # 遇到绿灯,速度放缓(if encountering a green light, reduce speed) twist.linear.x = self.slow_down_speed self.stop = False if not self.stop: # 其他非停止的情况速度放缓, 同时计时,时间=斑马线的长度/行驶速度(in other non-stop situations, slow down speed, and at the same time, measure time, where time equals the length of the zebra crossing divided by the driving speed) twist.linear.x = self.slow_down_speed if time.time() - self.count_slow_down > self.crosswalk_length/twist.linear.x: self.start_slow_down = False else: twist.linear.x = self.normal_speed # 直走正常速度(straight ahead at normal speed) # 检测到 停车标识+斑马线 就减速, 让识别稳定(decelerate upon detecting a stop sign + zebra crossing to stabilize recognition) if 0 < self.park_x and 135 < self.crosswalk_distance: twist.linear.x = self.slow_down_speed if self.machine_type != 'JetRover_Acker': if not self.start_park and 255 < self.crosswalk_distance: # 离斑马线足够近时就开启停车(activate stop when close enough to the zebra crossing) self.mecanum_pub.publish(Twist()) self.start_park = True self.stop = True threading.Thread(target=self.park_action).start() elif self.machine_type == 'JetRover_Acker': if not self.start_park and 176 < self.crosswalk_distance: # 离斑马线足够近时就开启停车(when close enough to the zebra crossing, initiate stopping) self.mecanum_pub.publish(Twist()) self.start_park = True self.stop = True threading.Thread(target=self.park_action).start() # 右转及停车补线策略(right turn and stop line filling strategy) if self.turn_right: y = self.lane_detect.add_horizontal_line(binary_image) if 0 < y < 300: roi = [(0, y), (w, y), (w, 0), (0, 0)] cv2.fillPoly(binary_image, [np.array(roi)], [0, 0, 0]) # 将上面填充为黑色,防干扰(fill the above area with black to prevent interference) min_x = cv2.minMaxLoc(binary_image)[-1][0] cv2.line(binary_image, (min_x, y), (w, y), (255, 255, 255), 40) # 画虚拟线来驱使转弯(draw virtual lines to guide the turn) elif 0 < self.park_x and not self.start_turn: # 检测到停车标识需要填补线,使其保持直走(if a stop sign is detected, fill in the lines to keep the vehicle going straight) if not self.detect_far_lane: up, down, center = self.lane_detect.add_vertical_line_near(binary_image) binary_image[:, :] = 0 # 全置黑,防止干扰(set all to black to prevent interference) if 50 < center < 80: # 当将要看不到车道线时切换到识别较远车道线(switch to recognizing farther lane lines when lane lines are about to become invisible) self.detect_far_lane = True else: up, down = self.lane_detect.add_vertical_line_far(binary_image) binary_image[:, :] = 0 if up != down: cv2.line(binary_image, up, down, (255, 255, 255), 20) # 手动画车道线(manually draw lane lines) result_image, lane_angle, lane_x = self.lane_detect(binary_image, image.copy()) # 在处理后的图上提取车道线中心(extract lane center from processed image with lane lines) # cv2.imshow('image', image) # 巡线处理(line following processing) if lane_x >= 0 and not self.stop: if lane_x > 150: # 转弯(turning) if self.turn_right: # 如果是检测到右转标识的转弯(if it's a right turn sign detected, initiate the turn) self.count_right_miss += 1 if self.count_right_miss >= 50: self.count_right_miss = 0 self.turn_right = False self.count_turn += 1 if self.count_turn > 5 and not self.start_turn: # 稳定转弯(steady turn) self.start_turn = True self.count_turn = 0 self.start_turn_time_stamp = time.time() if self.machine_type != 'JetRover_Acker': twist.angular.z = -0.45 # 转弯速度(turning speed) else: twist.angular.z = twist.linear.x*math.tan(-0.6)/0.213 # 转弯速度(turning speed) else: # 直道由pid计算转弯修正(PID calculates turn correction for straight road) self.count_turn = 0 if time.time() - self.start_turn_time_stamp > 3 and self.start_turn: self.start_turn = False if not self.start_turn: self.pid.SetPoint = 100 # 在车道中间时线的坐标(the coordinates of the line when in the middle of the lane) self.pid.update(lane_x) if self.machine_type != 'JetRover_Acker': twist.angular.z = common.set_range(self.pid.output, -0.8, 0.8) else: twist.angular.z = twist.linear.x*math.tan(common.set_range(self.pid.output, -0.1, 0.1))/0.213 else: if self.machine_type == 'JetRover_Acker': twist.angular.z = 0.15*math.tan(-0.6)/0.213 # 转弯速度(turning speed) self.mecanum_pub.publish(twist) else: self.pid.clear() # 绘制识别的物体,由于物体检测的速度小于线检测的速度,所以绘制的框会有所偏离(draw recognized objects. Due to the slower speed of object detection compared to line detection, the drawn boxes may deviate slightly) if self.objects_info != []: for i in self.objects_info: box = i.box class_name = i.class_name cls_conf = i.score cls_id = self.classes.index(class_name) color = colors(cls_id, True) plot_one_box( box, result_image, color=color, label="{}:{:.2f}".format(class_name, cls_conf), ) else: time.sleep(0.01) bgr_image = cv2.cvtColor(result_image, cv2.COLOR_RGB2BGR) if self.display: cv2.imshow('result', bgr_image) key = cv2.waitKey(1) if key == ord('q') or key == 27: # 按q或者esc退出(press q or esc to exit) self.is_running = False self.result_publisher.publish(self.bridge.cv2_to_imgmsg(bgr_image, "bgr8")) time_d = 0.03 - (time.time() - time_start) if time_d > 0: time.sleep(time_d) self.mecanum_pub.publish(Twist()) rclpy.shutdown() |
The main function within the class, runs different line-following strategies according to different chassis types.
6.6 FAQ
(1) The robot exhibits inconsistent performance during line patrolling, often veering off course.
Adjust the color threshold to better suit the lighting conditions of the actual scene. For precise instructions on color threshold adjustment, please consult 5. ROS+OpenCV Lesson for detailed guidance.
(2) The robot’s turning radius appears to be either too large or too small.
① Ensure correct adjustment of the robot arm deviation. For detailed instructions on robot arm deviation adjustment, please refer to 7. Robot Arm Control Course->7.1 Basic Control for comprehensive learning.
② Modify the line patrol processing code
Navigate to the game program path by entering the command:
cd ros2_ws\src\example\example\self_driving
Open the game program by entering the command:
vim self_driving.py
The red box denotes the lane’s center point, which can be adjusted to fine-tune the turning effect. Decreasing the value will result in earlier turns, while increasing it will cause later turns.
(3) The parking location is suboptimal.
You can adjust the parking processing function or modify the starting position of the parking operation. For detailed instructions, please consult 6.5.5 Autonomous Parking -> Parameter Adjustment for reference and learning.
(4) Inaccurate traffic sign recognition.
Adjust the target detection confidence. For detailed instructions, please refer to 6.5.5 Autonomous Parking -> Operation Steps for comprehensive learning.