8. Voice Interaction Applications
8.1 Voice Module Installation
8.1.1 Install the WonderEcho Pro
8.1.2 Install the 6-Microphone Array
8.2 Switching Wake Words
The system uses the English wake-up phrase Hello Hiwonder by default. To use a different wake-up phrase or command, follow the steps below.
For robots with the WonderEcho Pro:
Make sure the corresponding language firmware is flashed first. Refer to the tutorial 02 Firmware Flashing under the folder Voice Control Basic Lesson for detailed instructions.
For robots using the 6-Microphone Array:
Set the recognition language via the desktop configuration tool. Double-click the Tool icon
on the system desktop.In the Tool interface, switch the language, then click Save → Apply → Quit. The default language is English.
After restarting the robot, the wake word will be successfully switched.
8.3 Six-Microphone Array Configuration (Must Read)
8.3.1 Offline Speech Package & ID
Since offline speech recognition is used in this section, an offline speech resource package from iFLYTEK is required. The offline speech package is only available to accounts registered in supported regions. The following steps describe how to complete the registration process.
Visit the iFlytek Open Platform at https://www.xfyun.cn/, and create a new account.
Choose Login with phone number and fill in the required information. International users may select their appropriate country code.
Once registered, go to console > My Application to create a new application.
Fill in the required fields and click Submit.
Open the newly created application.
Click on Offline Voice Command Recognition, locate the corresponding APPID in the red box shown below. Then navigate to Offline Wake Word Recognition SDK → Linux MSC and click to download.
Note
Each newly registered application can be used for free for 90 days. After the free period expires, continued use requires a paid plan. When an application expires, a new one can be registered, with a maximum of five applications per account. The process for creating a new app is the same.
8.3.2 Replacing Offline Speech Resources and ID
Extract the compressed package from the provided materials.
Note
Locate the extracted files according to the chosen download path.
Open the folder named Linux_aitalk_exp1227_01997b6c. The version ID, such as 1227_01997b6c, may vary depending on the official release. Navigate to the bin/msc/res/asr directory and locate the common.jet file. Drag this file onto the desktop of the robot’s system image.
Click the terminal icon
in the system desktop to open a command-line window.Run the following command to replace the common.jet file:
cp /home/ubuntu/Desktop/common.jet /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/config/msc/res/asr/
Run the following command to modify the APPID:
vim ./ros2_ws/src/xf_mic_asr_offline/launch/mic_init.launch.py
Locate the code shown in the figure below.
Press the i key to enter edit mode. Then replace 01997b6c with the newly obtained APPID.
8.4 Voice-Controlled Robot Movement
8.4.1 Program Overview
In this section, the robot’s built-in speech recognition capabilities are used to control its movements through voice commands, such as moving forward or backward.
From a programming perspective, the system subscribes to the voice recognition service published by the microphone array node. It processes the incoming voice data through localization, noise reduction, and recognition, and then retrieves the recognized sentence and the angle of the sound source. Once the robot is successfully awakened with a wake word and a specific command is spoken, the robot provides a voice response and performs the corresponding action. When a specific color is detected, the robot receives a voice command and performs movement actions such as forward, backward, turn left, or turn right.
Follow the Preparation section below to complete the setup for this feature, then refer to the Operation Steps section to carry out the feature.
8.4.2 Preparation
Before starting, install the voice module onto the robot and connect it to the hub via the USB port. Skip if already installed.
Please refer to Section 8.3 6-Microphone Array Configuration (Must Read) in this document. Follow the instructions to apply for an APPID and replace the corresponding files.
The system uses the English wake word Hello Hiwonder by default. If using a WonderEcho Pro as voice module, the voice interaction command phrases must be flashed. Refer to Section 8.2 Switching Wake Words for instructions on changing the language or flashing the command phrases.
8.4.3 Operation Steps
Note
Commands must be entered with correct capitalization. The Tab key can be used to auto-complete keywords.
Power on the robot and connect it via the NoMachine remote control software. For detailed information, please refer to the section 1.7.2 AP Mode Connection Steps in the user manual.
Click the terminal icon
in the system desktop to open a command-line window.Enter the command to disable the app auto-start service.
sudo systemctl stop start_app_node.service
Enter the following command and press Enter to launch the voice-controlled feature.
ros2 launch xf_mic_asr_offline voice_control_move.launch.py
After the program has successfully loaded, first say the wake word Hello Hiwonder and wait for the speaker to respond with “I’m here” before issuing the next voice command. For example, say “Move forward”. Once the robot recognizes the command, the speaker will respond “Copy that, starting to move forward”, and the robot will execute the corresponding action.
The command phrases and their corresponding actions are as follows:
| Command Phrase | Corresponding Action |
|---|---|
| Go forward | Control the robot to move forward. |
| Go backward | Control the robot to move backward. |
| Turn left | Control the robot to turn left. |
| Turn right | Control the robot to turn right. |
Note
To ensure optimal performance, operate the robot in a relatively quiet environment.
It is recommended to say the wake word before issuing each voice command.
When speaking the voice command, make sure the voice is loud and clear.
Issue voice commands one at a time, waiting for the robot to complete the current action and provide feedback before giving the next command.
To disable this feature, open a new command-line terminal and enter the following command:
~/.stop_ros.sh
Next, simply close all the open terminals.
8.4.4 Program Analysis
Voice control for robot movement involves establishing a connection between the voice control node and the robot’s low-level driver node. Through voice commands, the robot is controlled to execute the corresponding actions.
launch File
The launch file is located at: /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/launch/voice_control_move.launch.py
launch File
controller_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(controller_package_path, 'launch/controller.launch.py')),
)
lidar_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(peripherals_package_path, 'launch/lidar.launch.py')),
)
mic_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(xf_mic_asr_offline_package_path, 'launch/mic_init.launch.py')),
)
controller_launch is used to start the chassis control node. After launching, it allows control of the servo motors.
lidar_launch starts the LiDAR node, which will publish LiDAR data.
mic_launch starts the microphone functionality.
Start Nodes
voice_control_move_node = Node(
package='xf_mic_asr_offline',
executable='voice_control_move.py',
output='screen',
)
voice_control_move_node is used to call the voice-controlled movement source code and start the program.
Python Launch File
The source code is located at: /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/scripts/voice_control_move.py
Functions
Main:
def main():
node = VoiceControMovelNode('voice_control_move')
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
Starts the voice control movement.
Class:
class VoiceControMovelNode(Node):
def __init__(self, name):
rclpy.init()
super().__init__(name)
self.angle = None
self.words = None
self.running = True
self.haved_stop = False
self.lidar_follow = False
self.start_follow = False
self.last_status = Twist()
self.threshold = 3
self.speed = 0.3
self.stop_dist = 0.4
self.count = 0
self.scan_angle = math.radians(90)
self.pid_yaw = pid.PID(1.6, 0, 0.16)
self.pid_dist = pid.PID(1.7, 0, 0.16)
self.language = os.environ['ASR_LANGUAGE']
self.lidar_type = os.environ.get('LIDAR_TYPE')
self.machine_type = os.environ.get('MACHINE_TYPE')
self.mecanum_pub = self.create_publisher(Twist, '/controller/cmd_vel', 1)
Init:
self.mecanum_pub = self.create_publisher(Twist, '/controller/cmd_vel', 1)
self.buzzer_pub = self.create_publisher(BuzzerState, '/ros_robot_controller/set_buzzer', 1)
qos = QoSProfile(depth=1, reliability=QoSReliabilityPolicy.BEST_EFFORT)
self.create_subscription(LaserScan, '/scan_raw', self.lidar_callback, qos) # Subscribe to Lidar data
self.create_subscription(String, '/asr_node/voice_words', self.words_callback, 1)
self.create_subscription(Int32, '/awake_node/angle', self.angle_callback, 1)
self.client = self.create_client(Trigger, '/asr_node/init_finish')
self.client.wait_for_service() # Blocking wait
self.declare_parameter('delay', 0)
time.sleep(self.get_parameter('delay').value)
self.mecanum_pub.publish(Twist())
self.play('running')
self.get_logger().info('Wake up word: hello hiwonder')
self.get_logger().info('No need to wake up within 15 seconds after waking up')
if self.machine_type == 'ROSOrin_Acker':
self.get_logger().info('Voice command: turn left/turn right/go forward/go backward')
else:
self.get_logger().info('Voice command: turn left/turn right/go forward/go backward/come here')
self.time_stamp = time.time()
self.current_time_stamp = time.time()
threading.Thread(target=self.main, daemon=True).start()
self.create_service(Trigger, '~/init_finish', self.get_node_state)
self.get_logger().info('\033[1;32m%s\033[0m' % 'start')
Initializes various parameters, calls the chassis node, buzzer node, lidar node, and voice recognition node, and finally starts the main function.
get_node_state:
def get_node_state(self, request, response):
response.success = True
return respons
Initializes the node state.
Play:
def play(self, name):
voice_play.play(name, language=self.language)
Plays audio.
words_callback:
def words_callback(self, msg):
self.words = json.dumps(msg.data, ensure_ascii=False)[1:-1]
if self.language == 'Chinese':
self.words = self.words.replace(' ', '')
self.get_logger().info('words:%s' % self.words)
if self.words is not None and self.words not in ['wake-up-success', 'Sleep', 'Fail-5-times',
'Fail-10-times']:
pass
elif self.words == 'Wake-up-success':
self.play('awake')
elif self.words == 'Sleep':
msg = BuzzerState()
msg.freq = 1000
msg.on_time = 0.1
msg.off_time = 0.01
msg.repeat = 1
self.buzzer_pub.publish(msg)
The voice recognition callback function reads the data sent back by the microphone through the node.
angle_callback:
def angle_callback(self, msg):
self.angle = msg.data
self.get_logger().info('angle:%s' % self.angle)
self.start_follow = False
self.mecanum_pub.publish(Twist())
The sound source recognition callback function reads the angle of the sound source based on the direction of the wake word. The angle is detected by the microphone’s sound source localization.
lidar_callback:
def lidar_callback(self, lidar_data):
twist = Twist()
# Data size= scanning angle/ the increased angle per scan
if self.lidar_type != 'G4':
min_index = int(math.radians(MAX_SCAN_ANGLE / 2.0) / lidar_data.angle_increment)
max_index = int(math.radians(MAX_SCAN_ANGLE / 2.0) / lidar_data.angle_increment)
left_ranges = lidar_data.ranges[:max_index] # The left data
right_ranges = lidar_data.ranges[::-1][:max_index] # The right data
elif self.lidar_type == 'G4':
'''
ranges[right...->left]
forward
lidar
left 0 right
'''
min_index = int(math.radians((360 - MAX_SCAN_ANGLE) / 2.0) / lidar_data.angle_increment)
max_index = min_index + int(math.radians(MAX_SCAN_ANGLE / 2.0) / lidar_data.angle_increment)
left_ranges = lidar_data.ranges[::-1][min_index:max_index][::-1] # The left data
right_ranges = lidar_data.ranges[min_index:max_index][::-1] # The right data
# self.get_logger().info(self.lidar_type)
The LiDAR callback function processes LiDAR data. The robot follows the detected sound source angle and uses PID to compute the angular velocity. It then tracks the nearest object by combining the radar’s scan data, and adjusts the linear and angular velocities according to the radar’s location of the object.
Main:
def main(self):
while True:
if self.words is not None:
twist = Twist()
if self.words == '前进' or self.words == 'go forward':
self.play('go')
self.time_stamp = time.time() + 2
twist.linear.x = 0.2
elif self.words == '后退' or self.words == 'go backward':
self.play('back')
self.time_stamp = time.time() + 2
twist.linear.x = -0.2
elif self.words == '左转' or self.words == 'turn left':
self.play('turn_left')
self.time_stamp = time.time() + 2
if self.machine_type == 'ROSOrin_Acker':
twist.linear.x = 0.2
twist.angular.z = twist.linear.x/0.5
else:
twist.angular.z = 0.8
elif self.words == '右转' or self.words == 'turn right':
self.play('turn_right')
self.time_stamp = time.time() + 2
if self.machine_type == 'ROSOrin_Acker':
twist.linear.x = 0.2
twist.angular.z = -twist.linear.x/0.5
else:
twist.angular.z = -0.8
elif self.words == '过来' or self.words == 'come here' and self.machine_type != 'ROSOrin_Acker':
self.play('come')
if 270 > self.angle > 90:
twist.angular.z = -1.0
self.time_stamp = time.time() + abs(math.radians(self.angle - 90) / twist.angular.z)
else:
twist.angular.z = 1.0
if self.angle <= 90:
self.angle = 90 - self.angle
else:
self.angle = 450 - self.angle
self.time_stamp = time.time() + abs(math.radians(self.angle) / twist.angular.z)
# self.get_logger().info(self.angle)
self.lidar_follow = True
elif self.words == '休眠(Sleep)':
time.sleep(0.01)
self.words = None
self.haved_stop = False
The execution strategy after receiving a command involves publishing different linear and angular velocities based on the command, used to control the robot’s movement in various directions.
8.4.5 Extensions
Changing the Wake Word
The default Wake Word is Hello Hiwonder. It can be changed by modifying the configuration file. The example below demonstrates how to change the wake word to hello.
Note
Commands must be entered with correct capitalization. The Tab key can be used to auto-complete keywords.
Power on the robot and connect it via the NoMachine remote control software.
Click the terminal icon
in the system desktop to open a command-line window.Enter the following command and press Enter:
vim ./ros2_ws/src/xf_mic_asr_offline/launch/mic_init.launch.py
Locate the code shown in the figure below.
Press the i key to enter edit mode. Change the value of
english_awake_wordsto hello.
After editing, press Esc, then enter the command below and press Enter to save and exit.
:wq
Enter the following command to apply the new wake word:
ros2 launch xf_mic_asr_offline mic_init.launch.py enable_setting:=true
The configuration will take effect after about 30 seconds. On the next startup, the enable_setting parameter is no longer required.
Refer to 8.4.3 Operation Steps to restart the feature and verify that the wake word has been updated.
8.5 Voice-Controlled Color Recognition
8.5.1 Program Overview
This experiment uses the robot’s built-in speech recognition in combination with the vision-equipped robotic arm to identify objects in three different colors: red, green, and blue.
From a programming perspective, the system subscribes to the voice recognition service published by the microphone array node. It processes the incoming voice data through localization, noise reduction, and recognition, and then retrieves the recognized sentence and the angle of the sound source. Once the robot is successfully awakened and a specific voice command is issued, it provides corresponding voice feedback. At the same time, the onboard camera detects objects in red, green, and blue.
Follow the Preparation section below to complete the setup for this feature, then refer to the Operation Steps section to carry out the feature.
8.5.2 Preparation
Before starting, install the voice module onto the robot and connect it to the hub via the USB port. Skip if already installed.
Please refer to Section 8.3 6-Microphone Array Configuration (Must Read) in this document. Follow the instructions to apply for an APPID and replace the corresponding files.
The system uses the English wake word Hello Hiwonder by default. If using a WonderEcho Pro as a voice module, the voice interaction command phrases must be flashed. Refer to Section 8.2 Switching Wake Words in this document for instructions on changing the language or flashing the command phrases.
8.5.3 Operation Steps
Note
Ensure that no objects with similar or identical colors to the target block appear in the background, as this may cause interference during recognition.
If the color detection is inaccurate, the color thresholds can be adjusted. For more details, refer to the tutorials in file 6. ROS+OpenCV Course.
Power on the robot and connect it via the NoMachine remote control software. For detailed information, please refer to the section 1.7.2 AP Mode Connection Steps in the user manual.
The system uses the English wake word Hello Hiwonder by default. Refer to Section 8.2 Switching Wake Words in this document for instructions on changing the language or flashing the command phrases.
Click the terminal icon
in the system desktop to open a command-line window.Enter the command to disable the app auto-start service.
sudo systemctl stop start_app_node.service
Enter the following command and press Enter to launch the voice-controlled feature.
ros2 launch xf_mic_asr_offline voice_control_color_detect.launch.py
After the program starts, say the wake word Hello Hiwonder, then say “start color recognition” to begin the color detection process. The robot will identify the color and announce its name. For example, place a red block within the camera’s field of view. When the red object is detected, the robot will announce “Red.”
To stop color recognition, say the wake word Hello Hiwonder, then say “stop color recognition.”
Note
To ensure optimal performance, operate the robot in a relatively quiet environment.
It is recommended to say the wake word before issuing each voice command.
When speaking the voice command, make sure the voice is loud and clear.
Issue voice commands one at a time, waiting for the robot to complete the current action and provide feedback before giving the next command.
To disable this feature, open a new command-line terminal and enter the following command:
~/.stop_ros.sh
Next, close all the open terminals.
8.5.4 Program Analysis
Voice control allows the voice recognition node to interact with the robot’s underlying driving nodes and camera nodes. By issuing voice commands, the robot can identify color blocks and respond accordingly.
launch File
The launch file is located at: /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/launch/voice_control_color_detect.launch.py
launch File
controller_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(controller_package_path, 'launch/controller.launch.py')),
)
color_detect_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(example_package_path, 'example/color_detect/color_detect_node.launch.py')),
launch_arguments={
'enable_display': 'true',
}.items(),
)
mic_launch = IncludeLaunchDescription(
PythonLaunchDescriptionSource(
os.path.join(xf_mic_asr_offline_package_path, 'launch/mic_init.launch.py')),
)
controller_launch is used to start the chassis control node. After launching, it allows control of the servo motors.
color_detect_launch starts the color recognition node.
mic_launch starts the microphone functionality
init_pose_launch initializes the actions.
Start Nodes
voice_control_color_detect_node = Node(
package='xf_mic_asr_offline',
executable='voice_control_color_detect.py',
output='screen',
)
voice_control_color_detect_node calls the voice-controlled color recognition source code and starts the program.
Python Program
The source code is located at: /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/scripts/voice_control_color_detect.py
Functions
Main:
def main():
node = VoiceControlColorDetectNode('voice_control_color_detect')
executor = MultiThreadedExecutor()
executor.add_node(node)
executor.spin()
node.destroy_node()
Starts the voice-controlled color recognition.
Class:
VoiceControlColorDetectNode:
class VoiceControlColorDetectNode(Node):
def __init__(self, name):
rclpy.init()
super().__init__(name, allow_undeclared_parameters=True, automatically_declare_parameters_from_overrides=True)
self.count = 0
self.color = None
self.running = True
self.last_color = None
signal.signal(signal.SIGINT, self.shutdown)
self.language = os.environ['ASR_LANGUAGE']
self.buzzer_pub = self.create_publisher(BuzzerState, '/ros_robot_controller/set_buzzer', 1)
timer_cb_group = ReentrantCallbackGroup()
self.create_subscription(String, '/asr_node/voice_words', self.words_callback, 1, callback_group=timer_cb_group)
self.create_subscription(ColorsInfo, '/color_detect/color_info', self.get_color_callback, 1)
self.client = self.create_client(Trigger, '/asr_node/init_finish')
self.client.wait_for_service()
self.client = self.create_client(Trigger, '/color_detect/init_finish')
self.client.wait_for_service()
self.set_color_client = self.create_client(SetColorDetectParam, '/color_detect/set_param', callback_group=timer_cb_group)
self.set_color_client.wait_for_service()
self.play('running')
self.get_logger().info('Wake up word: hello hiwonder')
self.get_logger().info('No need to wake up within 15 seconds after waking up')
self.get_logger().info('Voice command: start color recognition/stop color recognition')
Init:
self.buzzer_pub = self.create_publisher(BuzzerState, '/ros_robot_controller/set_buzzer', 1)
timer_cb_group = ReentrantCallbackGroup()
self.create_subscription(String, '/asr_node/voice_words', self.words_callback, 1, callback_group=timer_cb_group)
self.create_subscription(ColorsInfo, '/color_detect/color_info', self.get_color_callback, 1)
self.client = self.create_client(Trigger, '/asr_node/init_finish')
self.client.wait_for_service()
self.client = self.create_client(Trigger, '/color_detect/init_finish')
self.client.wait_for_service()
self.set_color_client = self.create_client(SetColorDetectParam, '/color_detect/set_param', callback_group=timer_cb_group)
self.set_color_client.wait_for_service()
self.play('running')
self.get_logger().info('Wake up word: hello hiwonder')
self.get_logger().info('No need to wake up within 15 seconds after waking up')
self.get_logger().info('Voice command: start color recognition/stop color recognition')
threading.Thread(target=self.main, daemon=True).start()
self.create_service(Trigger, '~/init_finish', self.get_node_state)
self.get_logger().info('\033[1;32m%s\033[0m' % 'start')
Initializes various parameters and calls the chassis node, buzzer node, LiDAR node, voice recognition node, and color recognition node. Finally, the main function is launched.
get_node_state:
def get_node_state(self, request, response):
response.success = True
return response
Sets the current node state.
Play:
def play(self, name):
voice_play.play(name, language=self.language)
Audio Display:
Shutdown:
def shutdown(self, signum, frame):
self.running = False
Callback function when the program shuts down. It sets the running parameter to False, stopping the program.
get_color_callback:
def get_color_callback(self, msg):
data = msg.data
if data != []:
if data[0].radius > 30:
self.color = data[0].color
else:
self.color = None
else:
self.color = None
Receives information from the color recognition node and provides the current color recognition result.
send_request:
def send_request(self, client, msg):
future = client.call_async(msg)
while rclpy.ok():
if future.done() and future.result():
return future.result()
Publishes a service request.
words_callback:
def words_callback(self, msg):
words = json.dumps(msg.data, ensure_ascii=False)[1:-1]
if self.language == 'Chinese':
words = words.replace(' ', '')
self.get_logger().info('words: %s'%words)
if words is not None and words not in ['Wake-up-success', 'Sleep', 'Fail-5-times',
'Fail-10-times']:
if words == '开启颜色识别' or words == 'start color recognition':
msg_red = ColorDetect()
msg_red.color_name = 'red'
msg_red.detect_type = 'circle'
msg_green = ColorDetect()
msg_green.color_name = 'green'
msg_green.detect_type = 'circle'
msg_blue = ColorDetect()
msg_blue.color_name = 'blue'
msg_blue.detect_type = 'circle'
msg = SetColorDetectParam.Request()
msg.data = [msg_red, msg_green, msg_blue]
res = self.send_request(self.set_color_client, msg)
if res.success:
self.play('open_success')
else:
self.play('open_fail')
elif words == '关闭颜色识别' or words == 'stop color recognition':
msg = SetColorDetectParam.Request()
res = self.send_request(self.set_color_client, msg)
if res.success:
self.play('close_success')
else:
self.play('close_fail')
elif words == 'Wake-up-success':
self.play('awake')
elif words == 'Sleep':
msg = BuzzerState()
msg.freq = 1900
msg.on_time = 0.05
msg.off_time = 0.01
msg.repeat = 1
self.buzzer_pub.publish(msg)
Voice recognition callback function that controls whether color recognition should be started based on the recognized voice. If started, it will respond according to the color recognition node’s information.
Main:
def main(self):
while self.running:
if self.color == 'red' and self.last_color != 'red':
self.last_color = 'red'
self.play('red')
self.get_logger().info('\033[1;32m%s\033[0m' % 'red')
elif self.color == 'green' and self.last_color != 'green':
self.last_color = 'green'
self.play('green')
self.get_logger().info('\033[1;32m%s\033[0m' % 'green')
elif self.color == 'blue' and self.last_color != 'blue':
self.last_color = 'blue'
self.play('blue')
self.get_logger().info('\033[1;32m%s\033[0m' % 'blue')
else:
self.count += 1
time.sleep(0.01)
if self.count > 50:
self.count = 0
self.last_color = self.color
Based on the recognized color, it announces the corresponding color name.