9. Voice Control Course

9.1 Voice Device Installation

  • 6-Microphone Array Installation

First, connect the data cable to the interface at the bottom of the microphone. Then, align the four empty slots of the microphone with the four holes on the top bracket of the robot. Use a screwdriver and four M4 screws to secure the microphone to the bracket.

  • WonderEcho Pro Installation

First, connect the data cable to the port of WonderEcho Pro. Then align the four mounting holes on the voice box with the four holes in the center of the top bracket on the robot, and use a screwdriver and four M4 screws to secure the voice box to the bracket.

9.2 Switching Wake Words

The system uses the English wake-up phrase Hello Hiwonder by default. To use a different wake word or command, follow the steps below.

  1. For robots with the WonderEcho Pro: Make sure the corresponding language firmware is flashed first. Refer to the tutorial 02 Firmware Flashing under the folder Voice Control Basic Lesson for detailed instructions.

  2. For robots using the 6-Microphone Array: Set the recognition language via the desktop configuration tool. Double-click the Tool icon on the system desktop.

  3. In the Tool interface, switch the language, then click SaveApplyQuit. The default language is English.

  4. After restarting the robot, the wake word will be successfully switched.

9.3 6-Microphone Array Configuration (Must Read)

9.3.1 Apply for Offline Speech Recognition Resources and App ID

Since offline speech recognition is used in this section, an offline speech resource package from iFLYTEK is required. The offline speech package is only available to accounts registered in supported regions. The following steps describe how to complete the registration process.

  1. Visit the iFlytek Open Platform at https://www.xfyun.cn/, and create a new account.

  1. Choose Login with phone number and fill in the required information. For international access, the corresponding country code should be selected.

  1. Once registered, click console to create a new application.

  1. Fill in the required fields and click Submit.

  1. Open the newly created application.

  1. Click on Offline Voice Command Recognition, locate the corresponding APPID in the red box shown below. Then navigate to Offline command word recognition SDKLinux MSC and click the download button.

  1. Click Return to the old version.

  2. Select Linux, choose the required features, click SDK Download, and click Sure to begin the download.

  1. Then, click Go to set a personalized wake word experience package to set the wake word and submit.

  1. In the pop-up window, click Go to SDK Download Center to download the file by repeating Step 8.

Note

Each newly registered application can be used for free for 90 days. After the free period expires, continued use requires a paid plan. When an application expires, a new one can be registered, with a maximum of five applications per account. The process for creating a new app is the same.

9.3.2 Replacing Offline Voice Resources and ID

  1. Extract the compressed package from the provided materials.

Open the folder named Linux_aitalk_exp1227_01997b6c. The version ID, such as 1227_01997b6c, may vary depending on the official release. Navigate to the bin/msc/res/asr directory and locate the common.jet file. Drag this file onto the desktop of the robot’s system image.

  1. Click the icon on the system desktop to open the command line terminal.

  2. Enter the command and press Enter to replace the common.jet file.

cp /home/ubuntu/Desktop/common.jet /home/ubuntu/ros2_ws/src/xf_mic_asr_offline/config/msc/res/asr/
  1. Enter the command and press Enter to modify the APPID:

vim ./ros2_ws/src/xf_mic_asr_offline/launch/mic_init.launch.py
  1. Find the code shown in the figure below:

  1. Press the i key to enter edit mode and modify it to the applied APPID.

9.4 Voice-Controlled Robot Movement

9.4.1 Experiment Introduction

This experiment uses the voice recognition function to control the robot to make corresponding movements, such as moving forward and backward via voice commands.

Programmatically, by subscribing to the voice recognition service published by the microphone array node, the system processes the audio for sound source localization, noise reduction, and speech recognition to extract the recognized commands and the angle of the sound source. Next, after successfully waking the robot and speaking specific phrases, the robot will provide the corresponding audio feedback. Additionally, upon recognizing specific colors, the issued voice commands will control the robot chassis to execute actions such as moving forward, moving backward, turning left, and turning right.

Please refer to the 9.4.2 Preparation section below to complete the necessary setup for this experiment, and then follow the 9.4.3 Operation Steps to learn and experience this module.

9.4.2 Preparation

  1. Before starting this section, install the voice module onto the robot and plug it into the USB port of the hub. If the module is already installed, skip this step.

  2. Refer to 9.3 6-Microphone Array Configuration (Must Read) in the current directory to complete the APPID application and file replacement.

  3. By default, the system uses the English wake word Hello Hiwonder. To switch the wake words or when using the AI Voice Box WonderEcho Pro, the voice interaction command words must be flashed. Refer to the 9.2 Switching Wake Words section in this document for instructions on switching the language or flashing the command words.

9.4.3 Operation Steps

Note

Commands are strictly case-sensitive, and the Tab key can be used to auto-complete keywords.

  1. Power on ROSpider and connect it to the remote control software NoMachine. For instructions on connecting to the remote desktop, refer to 1. ROSpider User Manual \1.4 Development Environment Setup.

  2. Click the icon on the system desktop to open the command-line terminal.

  3. Enter the command to disable the app auto-start service:

~/.stop_ros.sh
  1. Enter the command and press Enter to enable the voice control function:

ros2 launch xf_mic_asr_offline voice_control_move.launch.py
  1. To close this function, press the shortcut key Ctrl + C. If it cannot be closed, please try repeatedly.

9.4.4 Program Outcome

Once the program loads successfully, first say the wake word Hello Hiwonder, wait for the voice device to reply with I’m here, and then issue the next voice command. For example, say go forward. After recognizing the voice command, the robot will announce Okay, starting to move forward and execute the corresponding movement.

The commands and their corresponding control actions are as follows:

Command Sentence Corresponding Function
go forward Go forward
go backward Go backward
turn left Turn left
turn right Turn right
move left Move left
move right Move right
dance Dance
come here Move towards the sound source

Note

  • To ensure optimal performance, operate this feature in a quiet environment.

  • For best results, say the wake word before issuing each voice command.

  • Speak all voice commands loudly and clearly.

  • Issue voice commands one at a time. Wait for the robot to complete its action and provide feedback before issuing the next command.

9.4.5 Program Analysis

Controlling the robot’s movement via voice involves establishing communication between the voice control node and the robot’s underlying drive node. This allows the robot to execute specific actions based on the issued voice commands.

  • Launch File Analysis

The launch startup file is located at: ros2_ws/src/xf_mic_asr_offline/launch/voice_control_move.launch.py

  1. Start the launch files.

controller_launch = IncludeLaunchDescription(
    PythonLaunchDescriptionSource(
        os.path.join(controller_package_path, 'launch/controller.launch.py')),
)

lidar_launch = IncludeLaunchDescription(
    PythonLaunchDescriptionSource(
        os.path.join(peripherals_package_path, 'launch/lidar.launch.py')),
)

mic_launch = IncludeLaunchDescription(
    PythonLaunchDescriptionSource(
        os.path.join(xf_mic_asr_offline_package_path, 'launch/mic_init.launch.py')),
)

controller_launch is used to start chassis control. After starting, servos can be controlled.

lidar_launch starts the LiDAR and will publish LiDAR data.

mic_launch starts the microphone function.

  1. Start node.

voice_control_move_node = Node(
    package='xf_mic_asr_offline',
    executable='voice_control_move.py',
    output='screen',
    parameters=[{'move': move}],
)

voice_control_move_node is used to call the voice-controlled movement program.

  • Python File

The program source code is located at: ros2_ws/src/xf_mic_asr_offline/scripts/voice_control_move.py

  1. Class Initialization

def __init__(self, name):
    rclpy.init()
    super().__init__(name)

    self.angle = None
    self.words = None
    self.running = True
    self.haved_stop = False
    self.lidar_follow = False
    self.start_follow = False
    self.last_status = Twist()
    self.threshold = 3
    self.speed = 0.3
    self.stop_dist = 0.4
    self.count = 0
    self.scan_angle = math.radians(90)
    self.declare_parameter('move', False)
    self.move = self.get_parameter('move').value

    self.pid_yaw = pid.PID(1.6, 0, 0.16)
    self.pid_dist = pid.PID(1.7, 0, 0.16)

    self.language = os.environ['ASR_LANGUAGE']
    self.controller = controller_client.ControllerClient()
    self.agc_controller = ActionGroupController(self.create_publisher(ServosPosition, 'servo_controller', 1), '/home/ubuntu/software/actionset_editor/ActionGroups')
    self.cmd_vel_pub = self.create_publisher(Twist, '/controller/cmd_vel', 1)
    self.buzzer_pub = self.create_publisher(BuzzerState, '/ros_robot_controller/set_buzzer', 1)
    qos = QoSProfile(depth=1, reliability=QoSReliabilityPolicy.BEST_EFFORT)
    self.create_subscription(String, '/asr_node/voice_words', self.words_callback, 1)
    self.create_subscription(Int32, '/awake_node/angle', self.angle_callback, 1)

    self.client = self.create_client(Trigger, '/asr_node/init_finish')
    self.client.wait_for_service()  # Blocking wait
    self.declare_parameter('delay', 0)
    time.sleep(self.get_parameter('delay').value)

    self.get_logger().info('Wake up word: hello hiwonder')
    self.get_logger().info('No need to wake up within 15 seconds after waking up')
    self.get_logger().info('Voice command: turn left/turn right/go forward/go backward/come here /dance')
    self.time_stamp = time.time()
    self.current_time_stamp = time.time()
    threading.Thread(target=self.main, daemon=True).start()
    self.create_service(Trigger, '~/init_finish', self.get_node_state)
    self.play('running')

    if self.language == 'Chinese':
        self.get_logger().info('\033[1;32m%s\033[0m' % '准备就绪')
    else:
        self.get_logger().info('\033[1;32m%s\033[0m' % 'I am ready')

Initialize the node to complete initialization tasks such as parameter configuration, topic subscription, publication, and service creation.

  1. get_node_state Method

def get_node_state(self, request, response):
    response.success = True
    return response

Service callback to return the status of whether the node is ready.

  1. play Method

def play(self, name):
    voice_play.play(name, language=self.language)

Play audio.

  1. words_callback Method

def words_callback(self, msg):
    self.words = json.dumps(msg.data, ensure_ascii=False)[1:-1]
    if self.language == 'Chinese':
        self.words = self.words.replace(' ', '')
    self.get_logger().info('words:%s' % self.words)
    if self.words is not None and self.words not in ['wake-up-success', 'Sleep', 'Fail-5-times',
                                                     'Fail-10-times']:
        pass
    elif self.words == 'wake-up-success':
        self.play('awake')
    elif self.words == 'Sleep':
        msg = BuzzerState()
        msg.freq = 1000
        msg.on_time = 0.1

        msg.off_time = 0.01
        msg.repeat = 1
        self.buzzer_pub.publish(msg)

Voice recognition callback function to process voice recognition results.

  1. angle_callback Method

def angle_callback(self, msg):
    self.angle = msg.data
    self.get_logger().info('angle:%s' % self.angle)
    self.start_follow = False
    self.start_follow = False 

Sound source recognition callback function. The sound source is read based on the wake-up direction. This source is the angle identified by the microphone sound source positioning.

  1. main Method

def main(self):
    while True:
        if self.words is not None:
            self.move = True
            twist = Twist()
            if self.words == '前进' or self.words == 'go forward':
                self.play('go')
                self.time_stamp = time.time() + 4
                twist.linear.x = 0.05
            elif self.words == '后退' or self.words == 'go backward':
                self.play('back')
                self.time_stamp = time.time() + 4
                twist.linear.x = -0.05
            elif self.words == '左转' or self.words == 'turn left':
                self.play('turn_left')
                self.time_stamp = time.time() + 4
                twist.angular.z = 0.3
            elif self.words == '右转' or self.words == 'turn right':
                self.play('turn_right')
                self.time_stamp = time.time() + 4
                twist.angular.z = -0.3
            elif self.words == '左平移' or self.words == 'move left':
                self.play('move_left')
                self.time_stamp = time.time() + 4
                twist.linear.y = 0.05
            elif self.words == '右平移' or self.words == 'move right':
                self.play('move_right')
                self.time_stamp = time.time() + 4
                twist.linear.y = -0.05
            elif self.words == '跳个舞吧' or self.words == 'dance':
                self.play('dance')
                self.agc_controller.run_action('twist')

            elif self.words == '过来' or self.words == 'come here':
                self.play('come')
                self.get_logger().info('\033[1;32m%s\033[0m' % self.angle)

                if 270 > self.angle > 90:
                    twist.angular.z = -0.3
                    self.time_stamp = time.time() + abs(math.radians(self.angle - 90) / twist.angular.z)
                else:
                    twist.angular.z = 0.3
                    if self.angle <= 90:
                        self.angle = 90 - self.angle
                    else:
                        self.angle = 450 - self.angle
                    self.time_stamp = time.time() + abs(math.radians(self.angle) / twist.angular.z)
                self.lidar_follow = True
            elif self.words == '休眠(Sleep)':
                time.sleep(0.01)
            self.words = None
            self.haved_stop = False
            if self.move:
                self.cmd_vel_pub.publish(twist)

        else:
            time.sleep(0.01)
        self.current_time_stamp = time.time()
        if self.time_stamp < self.current_time_stamp and not self.haved_stop and self.move:
            self.controller.traveling(gait=-2, time=1, steps=0)
            self.haved_stop = True
            if self.lidar_follow:
                self.lidar_follow = False
                self.start_follow = True

The execution strategy after receiving commands. Different linear and angular velocities are published based on different commands, which are used to control the robot to perform different movements.