Build your own Python virtual assistant

Feb 2021
By: Theo van der Sluijs

reading time: 8 min.

Category: Coding
Tags: Python
For building your own Python virtual assistant like Siri, Google Assistent, Alexa or even Bixby you only need a few lines of code.

Note: this explanation is mainly aimed at users on a Mac or Linux System. When there are questions about windows I will do my very best to help, but because I no longer have a windows system at home (hooray!) It is a bit more difficult for me to help. I hope you understand this!

Build your own Python Virtual Assistant within minutes!

I love Virtual Assistants, I used to use the Google Assistant and it was a great experience, never really liked Siri because it’s to Vendor specific and recently I’ve started to use the Amazon Virtual Assistant “Alexa” more and more.

Unfortunately not all Echo devices are sold in the Netherlands but I like it so much that I used an extra shipping service outside of Amazon for shipping my Echo Devices purchases from Germany to the Netherlands.

For this small tutorial to build your own Virtual Assistant it does not really matter what Virtual Assistant you like most, as you are going to Build your Own!

Python virtual assistant prerequisites

So, as mentioned above this tutorial focuses on Linux and Mac, but still you can use it on any windows machine.

For this to work you need PortAudio on your machine. PortAudio is an open-source computer library for audio playback and recording. It is a cross-platform library, so programs using it can run on many different computer operating systems, including Windows, Mac OS X and Linux.

We are going to use brew to install PortAudio. Homebrew is a free and open-source software package management system that simplifies the installation of software on Apple’s macOS operating system and Linux.

Open a terminal and do:

brew remove portaudio
brew install portaudio

Yes, I first remove PortAudio. Somehow when working on older versions of PortAudio it does not update as it should be. You might not have any trouble when just installing it, but well it does no harm to first uninstall it.

When I develop my Python scripts I always do this in virtual environments or venv. A venv (virtual environment) is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.

The benefit of this is that your main system stays clean and you can create a venv per project so you can “lift and shift” it to another system where you can easily create the same sandboxed environment.

Trust me, you want to use Python environments!

If you want to create a venv in VS Code, just type this in the VS Code terminal:

python -m venv .venv

For this little script we are using 3 python libraries:

SpeechRecognition is a library for performing speech recognition, with support for several engines and APIs, online and offline.

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple Mac OS X / macOS.

Pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with Python 3.

When you are in your environment, or not if you don’t want to use Python environments, open a terminal and run:

pip install speechrecognition
pip install pyaudio
pip install pyttsx3

Prep a Virtual Assistent voice

On mac and windows (I’m not sure about Linux) there are a variety of installed out of the box voices you can choose from that will speak to you as a VA (Virtual Assistent).

To listen to these voices and pick one try this Python code.

# pip install pyttsx3

import pyttsx3

# known languages: 'en_US', 'it_IT', 'sv_SE', 'fr_CA', 'de_DE', 'en_US', 'he_IL', 'id_ID', 'en_GB', 'es_AR', 'nl_BE', 'en-scotland', 'en_US', 'ro_RO', 'pt_PT', 'es_ES', 'es_MX', 'th_TH', 'en_AU', 'ja_JP', 'sk_SK', 'hi_IN', 'it_IT', 'pt_BR', 'ar_SA', 'hu_HU', 'zh_TW', 'el_GR', 'ru_RU', 'en_IE', 'es_ES', 'nb_NO', 'es_MX', 'en_IN', 'en_US', 'da_DK', 'fi_FI', 'zh_HK', 'en_ZA', 'fr_FR', 'zh_CN', 'en_IN', 'en_US', 'nl_NL', 'tr_TR', 'ko_KR', 'ru_RU', 'pl_PL', 'cs_CZ'

def voices():
    engine = pyttsx3.init()
    voices = engine.getProperty('voices')

    # only the languages I like to hear
    my_langs = ['en_US', 'en_GB', 'en-scotland', 'nl_BE', 'nl_NL']
    for voice in voices:
        if voice.languages[0] in my_langs:
            engine.setProperty('voice',  # changes the voice
            engine.say('The quick brown fox jumped over the lazy dog.')


This small script will say “The Quick brown fox jumped over the lazy dog.”.

The script will also print out this for each voice.


The ID part is the important part. You will need it later! Copy it for later usage.


Well with all the preparations, now we can start writing code.

Coding a Python virtual assistant

We can now start coding the Virtual assistant.

First we start with importing some stuff you need for this python Virtual assistant.

import sys

import speech_recognition as sr
import pyttsx3

You could leave the sys import out. But I wanted to say shutdown and exit the program. The other two are to listen and convert speech to text and to actually talk to you.

# Initialize the recognizer
r = sr.Recognizer()

Here is where we start up the speech recognizer.

I’ve created a class to put all the logic into. I think working with classes is easier than just the usage of def’s. So my class starts with class SpeakToMe: at the end you call the class with

if __name__ == "__main__":
    #call the class 
    stm = SpeakToMe()

    # start python to listnen to you!

You first call the class and with listen_to_me() part of the class you start the listening and assisting part.

In the first part of the class I just add some information to work with.

The voice ID, you picked with the code in the preparation part, I picked AVA:

self.voice_id = ''

The driver name, in my case on my Mac nsss:

self.driver_name = 'nsss'

The language you will talk in:

self.speech_lang = 'en_US'

The self. in the class tells it that it will be callable throughout the whole class.

Then there are 3 functions.

There is SpeakText this is the actual talking to you function. The importan argument in this function is my_command. If you do a

self.SpeakText(my_command = "The quick brown fox")

It will actually say “The quick brown fox” to you.

Then the next important function is the listener function called listen_to_me this function has a while(True): part that will keep listening to you until you break the program or until you shut it down. When you speak this function translates your spoken words to text. This text is later used to let the program understand what action it should take. So this Python script is not really smart. So you could say : “Please remind me tomorrow to get some cake for my birthday at 2 O’clock” trust me, that will not work.

This script is for a 0.1 assistant. Not very bright. So use single word phrases like: “Weather” or “Hello” or “Shutdown”.

Sure, you can make it more and more smart, but here it’s just a very basic Virtual assistant.

Least but not last the most important function! The yes_minion function. This is where the real action takes place! Here it converts your word (in text) to an action.

So if you want to let the program react to a word (or words) you can create small actions within this function to do something for you, like:

        system_call = ['hello', 'hallo']
        if my_command in system_call:
            sorry = "Yes Sir?"
            self.SpeakText(sorry, voice_id=self.voice_id,

This small code reacts to your “hello” or (in dutch) ‘hallo’ and will say “Yes Sir?”. This seems very easy, but again this is just a simple example of what it can do.

Complete Python virtual assistant code

And now, for the whole code!

import sys
import speech_recognition as sr
import pyttsx3

# Initialize the recognizer
r = sr.Recognizer()

class SpeakToMe:
    def __init__(self) -> None:
        self.voice_id = ''
        self.driver_name = 'nsss'
        self.speech_lang = 'en_US'

    def SpeakText(my_command: str = None, driver_name: str = 'nsss',
                  voice_id: str = ''):
            # Initialize the engine
            if my_command is None:
            engine = pyttsx3.init(driverName=driver_name)
            engine.setProperty('voice', voice_id)
        except Exception as e:

    def listen_to_me(self):

        text = "Good day sir, let me initialize and start some shit up!"
        self.SpeakText(text, voice_id=self.voice_id,

        # Loop infinitely for user to
        # speak

                # use the microphone as source for input.
                with sr.Microphone() as source:
                    # wait for a second to let the recognizer
                    # adjust the energy threshold based on
                    # the surrounding noise level
                    r.adjust_for_ambient_noise(source, duration=0.2)

                    # listens for the user's input
                    audio = r.listen(source)

                    # Using google to recognize audio
                    my_text = r.recognize_google(
                        audio, language=self.speech_lang)
                    my_text = my_text.lower()


                    my_text = ""

            except sr.RequestError as e:
                print(f"Could not request results; {e}")
            except sr.UnknownValueError as e:
            except Exception as e:
                print(f"Error: ' {e} ' occured!")

    def yes_minion(self, my_command: str = None):
        if my_command is None:

        system_call = ['hello', 'hallo']
        if my_command in system_call:
            sorry = "Yes Sir?"
            self.SpeakText(sorry, voice_id=self.voice_id,

        shutdown = ['shut down', 'shutdown']
        if my_command in shutdown:
            talk = "Shutting down in 3.....2.....1"
            self.SpeakText(talk, voice_id=self.voice_id,

        sorry = "Sorry sir, I do not understand you. Could you refrase that question?"
        self.SpeakText(sorry, voice_id=self.voice_id,

if __name__ == "__main__":
    #call the class 
    stm = SpeakToMe()

    # start python to listnen to you!

Any questions? Leave them in the comments below. Like my code? Please buy me a coffee!


Submit a Comment

Your email address will not be published. Required fields are marked *