Design and Fabrication of Talking Robot

Authors: Akshay Prakash J S, Sentahmizh Chittu D, Nandhini V, Abhishek N, Mr. G. Chandrasekar

Affiliation: Dhanalakshmi Srinivasan Engineering College (Autonomous), Perambalur, India

Abstract

This paper presents the design and fabrication of a mini talking robot. The project aims to create an artificial intelligence voice assistant robot using pre-recorded audio for speech. The goal is to develop a robot with a low manufacturing cost, incorporating advanced inventions. The paper discusses the current work, focusing on the challenges of grounding user utterances in the robot's environment, leveraging advancements in speech technologies and the potential for vocal interfaces in consumer robots.

Keywords

Robot

I. INTRODUCTION

A talking robot is a type of robot capable of producing speech or other vocalizations for communication. These robots can be programmed to speak in various languages and are used in applications like customer service, language learning, and entertainment. They can be controlled by computer programs or human operators, utilizing natural language processing to understand and respond to human speech. Talking robots employ text-to-speech (TTS) technology to convert text into spoken words, enabling natural communication. Applications span customer service, education, and entertainment, with development extending to healthcare and other industries requiring human-like interaction. Examples include personal assistants like Amazon's Alexa and Google Home, and advanced robots like Hanson Robotics' Sophia.

II. LITERATURE SURVEY

Talking with a robot in ENGLISH (March 2005) L Stephen Coles. Review: Future plans for a research project on natural language communication with an intelligent automaton.
Building a talking baby robot, a contribution to the study of speech acquisition and evolution (November 2007) J. Serkhane, J.L. Schwartz, P. Bessière ICP, Grenoble Laplace-SHARP, Gravir, Grenoble. Review: Provides a natural computational modelling framework for cognitive robotics and speech robotics, based on embodiment, multimodality, development, and interaction.
A Talking Robot and the Expressive Speech Communication with Human (December 2014) Hideyuki SAWADA. Review: Introduces a talking robot with mechanically constructed human-like vocal cords and vocal tract, with specified pitches and phonemes.
Smart talking robot Xiaotu: participatory library service based on artificial intelligence (April 2015) Fei Yao, Chengyu Zhang and Wu Chen. Review: Discusses a project integrating mobile and social networking environments with computer technologies for a user-centred participatory service, featuring a modularized architecture for sharing.
Speech Planning of an Anthropomorphic Talking Robot for Consonant Sounds Production (May 2002) Kazufumi Nishikawa, Akihiro Imai, Takayuki Ogawara, Hideaki Takanobu, Takemi Mochida, Atsuo Takanishi. Review: Describes the development of the WT-1R robot, improved for natural vowel and consonant sound production, proposing speech planning for complex consonant sounds.
Multilingual WikiTalk: Wikipedia-based talking robots that switch languages (August 2013) Graham Wilcock and Kristiina Jokinen. Review: Presents a talking robot demonstrating Wikipedia-based spoken information access in English, and a new demo showing multilingual capabilities and language switching.
How Computers (Should) Talk to Humans (April 2006) Robert Porzel. Review: Focuses on dialogue systems research in human-computer interaction, highlighting improvements in natural language generation and synthesis, and the design of computers as interlocutors.
Artificial Intelligence-based Voice Assistant (October 2020) S Subhash, Prajwal N Srivatsa, S Siddesh, A Ullas, B Santhosh. Review: Discusses voice control as a growing feature, with AI-based voice assistants used in smartphones and laptops for recognizing human voice and responding via integrated voices.
VOICE ASSISTANT - A REVIEW (March 2021) Shabdali Suresh Shetty. Review: Covers a voice-activated personal assistant developed using Python, performing tasks like weather updates, music streaming, and Wikipedia browsing, with current systems limited by network connectivity.
Alexa-Based Voice Assistant for Smart Home Applications (July 2021) Asuncion Santamaria, Guillermo del Campo, Edgar Saavedra and Clara Jimenez. Review: Highlights the role of the Internet of Things (IoT) in creating smart environments and the growth of IoT devices for smart buildings and cities.

III. EXISTING SYSTEM AND LIMITATIONS

3.1 Existing Model

Current AI voice assistant robots include Amazon's Echo (Alexa), Google Home (Google Assistant), and Apple's Home Pod (Siri), designed for voice commands and tasks like playing music and setting reminders. They integrate with smart home devices like thermostats and lights. Other manufacturers include Sonos, Harman Kardon, and JBL. These robots are often integrated into smart speakers or other devices. Standalone AI voice assistant robots like Jibo and Kuri also exist. The Sofia robot by Hanson Robotics is a humanoid AI robot capable of recognizing and responding to human emotions, used in customer service, education, and entertainment, featuring expressive features for natural interaction.

3.2 Limitation of Existing Models

Existing voice assistant robots have several limitations:

Limited understanding: Difficulty with certain accents, dialects, or specific language nuances.
Limited knowledge: May not have access to all human knowledge, limiting answer accuracy.
Privacy concerns: Issues regarding the security and privacy of collected personal data.
Limited tasks: Inability to perform all human tasks like cooking or cleaning.
Internet connectivity: Dependency on stable internet connection for functionality.
Dependency: Potential decrease in human memory, focus, or simple task performance due to overreliance.
Cost: Advanced voice assistant robots can be expensive and less accessible.

The project focuses on minimizing the building cost of the robot.

IV. PROPOSED SYSTEM

The proposed project is a talking robot capable of producing speech or spoken words through text-to-speech synthesis, pre-recorded speech, or a combination. Talking robots find applications in customer service, language learning, and entertainment. Options for building a budget-friendly talking robot include using a Raspberry Pi as the central unit, connected to a microphone and speaker, and utilizing open-source software like Python and the Open Speech Platform. Alternatively, a microcontroller like Arduino with a pre-built speech recognition module can be used. The cost of materials varies by design, but both options are relatively inexpensive compared to commercial alternatives. The primary focus of this project is to minimize robot expense.

V. HARDWARE COMPONENTS

5.1 Raspberry Pi 4

The Raspberry Pi is a series of small single-board computers developed in the UK. Initially intended for teaching computer science, it gained popularity for uses like robotics due to its low cost, modularity, and open design. It is commonly used by computer and electronics hobbyists.

Description of Fig 5.1.1 RASBERRY PI 4: A photograph of the Raspberry Pi 4 single-board computer, showing its compact size and various ports and connectors.

SoC: Broadcom BCM2711B0 quad-core A72 (ARMv8-A) 64-bit @ 1.5GHz
Networking: 2.4 GHz and 5 GHz 802.11b/g/n/ac wireless LAN
RAM: 1GB, 2GB, or 4GB LPDDR4 SDRAM
Bluetooth: Bluetooth 5.0, Bluetooth Low Energy (BLE)
GPIO: 40-pin GPIO header, populated
Storage: microSD
Dimensions: 88 mm × 58 mm × 19.5 mm, 46 g

5.2 Arduino Uno

The Arduino Uno is an open-source microcontroller board based on the ATmega328P microcontroller. It features digital and analog input/output pins for interfacing with various expansion boards and circuits. It is programmable via the Arduino IDE using a USB cable.

Description of Fig 5.2.1 ARDUINO UNO: A photograph of the Arduino Uno microcontroller board, highlighting its digital and analog pins, USB port, and ATmega328P chip.

Microcontroller: Microchip ATmega328P
Operating Voltage: 5 Volts
Input Voltage: 7 to 20 Volts
Digital I/O Pins: 14

5.3 L298N Motor Driver

The L298N Motor Driver Module is a high-power module for driving DC and Stepper Motors. It contains an L298 motor driver IC and a 78M05 5V regulator, capable of controlling up to 4 DC motors or 2 DC motors with directional and speed control using an H-Bridge.

Description of Fig 5.3.1 L298N Motor Driver: A photograph of the L298N Motor Driver module, a circuit board designed to control DC motors.

Logical voltage: 5V
Drive voltage: 5V-35V
Logical current: 0-36Ma
Drive current: 2A (max. single bridge)
Max power: 25W
Motor Supply Voltage (Maximum): 46V
Motor Supply Current (Maximum): 2A

5.4 Geared DC Motor

A geared DC motor has a gear assembly attached to increase torque and reduce speed. It features a 3 mm threaded drill hole in the shaft for easy connection to wheels or other mechanical assemblies.

Description of Fig 5.4.1 GEARED MOTOR: A photograph of a geared DC motor, showing the motor unit attached to a gearbox with a yellow wheel.

Torque: 2 kg-cm
Operating Voltage: 12V DC
Gearbox: Attached Metal(spur)Gearbox
Shaft diameter: 6mm with internal hole
No-load current: 60 mA (Max)
Load current: 300 mA (Max)

5.5 Servo Motor

A servo motor is a rotary actuator allowing precise control of angular position, velocity, and acceleration. It includes a motor coupled with a sensor for position feedback, offering good torque, holding power, and fast updates.

Description of Fig 5.5.1 SERVO MOTOR MG995: A photograph of a servo motor, model MG995, commonly used for precise angular control in robotics.

Model: MG90S
Operating voltage: 4.8V to 6V
Recommended voltage: 5V
Stall torque: 1.8 kg/cm (4.8V)
Max stall torque: 2.2 kg/cm (6V)
Gear type: Metal
Rotation: 0°-180°

5.6 Ultrasonic Sensor

An ultrasonic sensor is an electronic device that emits ultrasonic sound waves and converts reflected sound into an electrical signal to determine distance. Ultrasonic waves travel faster than audible sound. It consists of a transmitter and a receiver.

Description of Fig 5.6.1 Ultrasonic Sensor: A photograph of the HC-SR04 ultrasonic sensor module, showing its two transducers (transmitter and receiver) and circuit board.

Sensing range: 40 cm to 300 cm
Response time: 50 milliseconds to 200 milliseconds
Operating voltage: 20 VDC to 30 VDC
Preciseness: ±5%
Resolution: 1mm
Sensor output voltage: 0 VDC – 10 VDC
Target dimensions for max distance: 5 cm × 5 cm

5.7 Speaker

Speakers are output devices used to generate sound, connecting to computers or sound systems. They convert electromagnetic waves into sound waves.

Description of Fig 5.7.1 SPEAKER: Zebronics Pluto speaker: A photograph of a pair of Zebronics Pluto speakers.

Output Power: 5W (2.5W x 2)
Driver Size: 52mm (2.04") x 2
Impedance: 3Ω
Frequency response: 120Hz-15kHzS/N
Ratio: ≥60dB
Separation: ≥50Db
Line input: 3.5mm jack

5.8 Mic

A microphone captures audio by converting sound waves into electrical signals. The first electronic microphone used a liquid mechanism with a diaphragm.

Description of Fig 5.8.1 Mic: A photograph of a microphone with a stand, designed for audio input.

Features: Plug and play, High quality recording, USB port to start using, Fast and convenient.

5.9 Battery

This battery pack uses ICR 18650 2500mAh 20C Lithium-Ion Batteries and a BMS circuit, offering small size and weight. It includes charge protection for direct charging via a DC adapter.

Description of Fig 5.9.1 BATTERY: A photograph of a Lithium-Ion battery pack, labeled with a warning, indicating its specifications.

Battery specification: 18650/11.1V/2600mAh
Termination voltage: 8.25V
Charging temperature: 10-45°C

VI. WORKING

6.1 Circuit Diagram

Description of FIG 6.1a ROBOT MOTION CONTROLER: A circuit diagram illustrating the connections between the Arduino Uno, L298N Motor Driver, motors, ultrasonic sensor, and power source for robot motion control.

Description of FIG 6.1b VOICE ASSISTANT: A block diagram showing the components of the voice assistant system, including Raspberry Pi, speaker, microphone, cloud services (Google Assistant, Dialogflow), and TTS engine, connected to the robot's systems.

6.2 Talking System

The talking system is a primary system for the robot, powered by a Raspberry Pi 4 and a 12V DC supply. The Raspberry Pi 4, speaker, and microphone are integrated into the robot's body. The system allows users to ask questions, which are processed by searching the internet. Audio is converted to text, then to audio output.

6.2.1 Key Points

Directly connects with the internet.
Raspberry Pi 4 with 4GB RAM ensures efficiency.
Fast internet connectivity provides quick replies.

6.3 Motion Control

The robot walks automatically using an ultrasonic sensor, enabling it to detect and avoid obstacles, making decisions autonomously. This feature makes it a smart, autonomous robot. The project also covers the use of ultrasonic sensors and serial monitors.

VII. SAMPLE CODE

The following Python code demonstrates a basic AI personal assistant named 'G-One'. It includes functions for wishing the user, taking voice commands, searching Wikipedia, opening web pages (YouTube, Google, Gmail), and retrieving weather information.

import speech_recognition as sr
import pyttsx3
import datetime
import wikipedia
import webbrowser
import os
import time
import subprocess
from ecapture import ecapture as ec
import wolframalpha
import json
import requests

print('Loading your AI personal assistant - G One')
engine=pyttsx3.init('sapi5')
voices=engine.getProperty('voices')
engine.setProperty('voice','voices[0].id')

def speak(text):
    engine.say(text)
    engine.runAndWait()

def wishMe():
    hour = datetime.datetime.now().hour
    if hour >= 0 and hour < 12:
        speak("Hello, Good Morning")
        print("Hello, Good Morning")
    elif hour >= 12 and hour < 18:
        speak("Hello, Good Afternoon")
        print("Hello, Good Afternoon")
    else:
        speak("Hello,Good Evening")
        print("Hello,Good Evening")

def takeCommand():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        audio = r.listen(source)

    try:
        statement = r.recognize_google(audio, language='en-in')
        print(f"user said: {statement}\n")

    except Exception as e:
        speak("Pardon me, please say that again")
        return "None"
    return statement

speak("Loading your AI personal assistant G-One")
wishMe()

if __name__ == '__main__':
    while True:
        speak("Tell me how can I help you now?")
        statement = takeCommand().lower()
        if statement == 0:
            continue

        if "good bye" in statement or "ok bye" in statement or "stop" in statement:
            speak('your personal assistant G-one is shutting down,Good bye')
            print('your personal assistant G-one is shutting down, Good bye')
            break

        if 'wikipedia' in statement:
            speak('Searching Wikipedia...')
            statement = statement.replace("wikipedia", "")
            results = wikipedia.summary(statement, sentences=3)
            speak("According to Wikipedia")
            print(results)
            speak(results)

        elif 'open youtube' in statement:
            webbrowser.open_new_tab("https://www.youtube.com")
            speak("youtube is open now")
            time.sleep(5)

        elif 'open google' in statement:
            webbrowser.open_new_tab("https://www.google.com")
            speak("Google chrome is open now")
            time.sleep(5)

        elif 'open gmail' in statement:
            webbrowser.open_new_tab("gmail.com")
            speak("Google Mail open now")

        elif "weather" in statement:
            api_key = "8ef61edcf1c576d65d836254e11ea420"
            speak("whats the city name")
            city_name = takeCommand()
            base_url = "https://api.openweathermap.org/data/2.5/weather?"
            complete_url = base_url + "appid=" + api_key + "&q=" + city_name
            response = requests.get(complete_url)
            x = response.json()
            if x["cod"] != "404":
                y = x["main"]
                current_temperature = y["temp"]
                current_humidiy = y["humidity"]
                z = x["weather"]
                weather_description = z[0]["description"]
                speak(f" Temperature in kelvin unit is {current_temperature} \n humidity in percentage is {current_humidiy} \n description {weather_description})")
                print(f" Temperature in kelvin unit = {current_temperature} \n humidity (in percentage) = {current_humidiy} \n description = {weather_description})")
            else:
                speak(" City Not Found ")

        elif 'time' in statement:
            strTime = datetime.datetime.now().strftime("%H:%M:%S")
            speak(f"the time is {strTime}")

        elif 'who are you' in statement or 'what can you do' in statement:
            speak('I am G-one version 1 point O your persoanl assistant. I am programmed to minor tasks like opening youtube,googlechrome,gmail and stackoverflow ,predict time,take a photo,searchwikipedia,predict weather in different cities, get top headline news from times of india and you can ask me computational or geographical questions too!')

        elif "who made you" in statement or "who created you" in statement or "who discovered you" in statement:
            speak("I was built by AKSHAY PRAKASH")
            print("I was built by AKSHAY PRAKASH")

        elif "open stackoverflow" in statement:
            webbrowser.open_new_tab("https://stackoverflow.com/login")
            speak("Here is stackoverflow")

        elif 'news' in statement:
            webbrowser.open_new_tab("https://timesofindia.indiatimes.com/home/headlines")
            speak('Here are some headlines from the Times of India, Happy reading')
            time.sleep(6)

        elif "camera" in statement or "take a photo" in statement:
            ec.capture(0, "robo camera", "img.jpg")

        elif 'search' in statement:
            statement = statement.replace("search", "")
            webbrowser.open_new_tab(statement)
            time.sleep(5)

        elif 'ask' in statement:
            speak('I can answer to computational and geographical questions and what question do you want to ask now')
            question = takeCommand()
            app_id = "R2K75H-7ELALHR35X"
            client = wolframalpha.Client('R2K75H-7ELALHR35X')
            res = client.query(question)
            answer = next(res.results).text
            speak(answer)
            print(answer)

        elif "log off" in statement or "sign out" in statement:
            speak("Ok, your pc will log off in 10 sec make sure you exit from all applications")
            subprocess.call(["shutdown", "/l"])
            time.sleep(3)

VIII. FUTURE IMPROVEMENTS

To future extension, we decided to control our robot using Bluetooth controlled joystick.
Change the robot body cardboard into 3D material.
We plan to improve movement of robot.
We decided to implement a display on robot.
Implement a high-quality speaker and mic.
Using PCBs for compactness, low signals noises and better integration. Faster real time processing with the use of private servers.

IX. CONCLUSION

The purpose of this research was to identify effective strategies for dealing with repetitive motions identified in individuals with autism spectrum disorder. Based on the analysis conveyed, it can be concluded that there are multiple behaviour modification therapies important for the improvement of this behaviour. Future exploration into behaviour modification techniques could be useful to finding further therapy techniques. The amount this could improve the lives of others with repetitive motion behaviour's is worth exploring.

Everybody loves Talking Robots. It's like having a little pal to mess around with. If the robot is humanoid then it's more fun than ever. Among all other robots, humanoid robots suddenly become a 'he/she' instead of 'it'. Having a humanoid talking robot feels like awesome, but talking robots are seemingly complicated to make.

To make a robot talk we can go through two methods Speech Synthesis and Pre-recorded audio. Among them option a doesn't perform understandably well with Arduino. So, we are going for method Pre-recorded audio.

Finally our wish has been full filled as an identity of our project and our robot working. We would like to thank everyone who belive us and support our project.

X. APPENDIX

9.1 Project Modeling Using Solid Works

Description of Fig 9.1.1 Starting Stage: A 3D rendering of the initial stage of the robot's design, showing a basic chassis with wheels and mounting points.

Description of Fig 9.1.2 Front View: A 3D rendering of the robot's front view, detailing its structure and components.

Description of Fig 9.1.3 Back View: A 3D rendering of the robot's back view, showing its construction and internal layout.

9.2 Project Images

Description of Fig 9.2.1 Front view: A photograph of the constructed robot's front view, showing its physical form.

Description of Fig 9.2.2 Side View: A photograph of the constructed robot's side view.

Description of Fig 9.2.3 Top View: A photograph of the constructed robot's top view.

REFERENCES

Talking with a robot in ENGLISH (March 2005) L Stephen Coles
Building a talking baby robot, a contribution to the study of speech acquisition and evolution (November 2007) J. Serkhane, J.L. Schwartz, P. Bessière ICP, Grenoble Laplace-SHARP, Gravir, Grenoble
A Talking Robot and the Expressive Speech Communication with Human (December 2014) Hideyuki SAWADA
Smart talking robot Xiaotu: participatory library service based on artificial intelligence (April 2015) Fei Yao, Chengyu Zhang and Wu Chen
Speech Planning of an Anthropomorphic Talking Robot for Consonant Sounds Production (May 2002) Kazufumi Nishikawa, Akihiro Imai, Takayuki Ogawara, Hideaki Takanobu, Takemi Mochida, Atsuo Takanishi
Multilingual WikiTalk: Wikipedia-based talking robots that switch languages (August 2013) Graham Wilcock and Kristiina Jokinen
How Computers (Should) Talk to Humans (April 2006) Robert Porzel

	BL20 Pro Robot Vacuum Quick Start Guide Step-by-step instructions for setting up and connecting the BL20 Pro robot vacuum, including charging, accessory installation, and Wi-Fi configuration via the Smart Life app.
	W100 Smart Glasses: AI Bluetooth Translator and Audio Device Specifications Detailed specifications for the W100 Smart Glasses, featuring AI translation, Bluetooth connectivity, voice assistant, and long battery life. Includes technical parameters, features, and package contents for this AI-powered wearable device.
	Open Source Internet of Things Platforms: A Survey This paper surveys and analyzes various open-source software platforms for the Internet of Things (IoT), detailing their features, protocols, and licensing. It aims to guide researchers and developers in selecting suitable IoT platforms.
	Convolutional Neural Networks for Speech Recognition This research paper explores the application of Convolutional Neural Networks (CNNs) to enhance Automatic Speech Recognition (ASR) systems, detailing their architecture, a novel limited weight sharing scheme, and experimental results showing improved performance over DNNs and GMM-HMMs.
	AI Safety: Modeling Adversarial Behavior Propagation in AI Agents Using Epidemiological Models and PINNs This research paper explores the application of epidemiological modeling, specifically the SEIR compartment model, combined with Physics-Informed Neural Networks (PINNs), to understand and manage systemic risks associated with adversarial behavior propagation in large-scale AI agent deployments. It analyzes vulnerabilities, estimates parameters from real-world data, and simulates attack scenarios across various sectors, providing a framework for proactive AI safety.
	External Knowledge Augmented Language Models for Code Generation and Agents: Thesis Proposal This thesis proposal by Fangzheng (Frank) Xu investigates the integration of external knowledge into language models for enhanced code generation and the development of AI agents. It covers pre-training, human studies, retrieval augmentation, and LLM agent applications, aiming to improve natural language interaction with computers.
	Robotic Vacuum Cleaner Technical Specifications Detailed technical specifications for a robotic vacuum cleaner, including dimensions, performance metrics, battery life, and operational modes.
	Controlling Electrical Home Appliances with Bluetooth Smart Technology This thesis investigates Home Automation and Energy Management systems, emphasizing the use of Bluetooth Low Energy (BLE) for smart home appliance control. It details the system architecture, prototype development, and experimental validation, aiming to improve home comfort, security, and energy efficiency.