Creating JARVIS - Python Voice Virtual Assistant (ChatGPT, ElevenLabs, Deepgram, Taipy)

preview_player
Показать описание
Check out the GitHub repository here:

0:00 Talking to JARVIS
0:58 Intro
1:52 How JARVIS works
3:12 How to setup JARVIS
4:05 Getting API keys
5:05 Installing JARVIS
6:49 Running JARVIS
7:44 Talking to JARVIS
9:18 How to mod JARVIS for your use case
10:45 Recording audio using Pyaudio
12:25 Transcribing to text using Deepgram
12:45 Sending prompts to OpenAI GPT
13:14 Changing JARVIS' personality (context)
14:10 Generating voice using ElevenLabs
14:50 Playing audio using Pygame
15:15 Displaying the convo in a webpage with Taipy
16:40 Use cases and limitations
Рекомендации по теме
Комментарии
Автор

Fantastic project. Love how you connected these services and packages together. Thanks for going over the project, posting this video, etc. I learned quite a bit.

joeternasky
Автор

Impressive! One key bit of the UX of ChatGPT mobile are the "clicks" that indicate when the model has 1. Stopped listening and 2. Stopped talking. A very small touch that makes a world of difference.

iandanforth
Автор

Many thanks for this super helpful tutorial! My next step is voice ID, so the AI knows it's me!

grtbigtreehugger
Автор

I did the same a few months ago but i made it all through a real phone number so you can actually call a number and an assistant will pick the call and talk to you about the shop services or clinic procedures, etc. Pretty nice lab.

rodrigodifederico
Автор

this is actually really incredible thanks for the video

isagiyoichi
Автор

Bro this is sick as hell! Thanks for posting a video about it.

dwilson
Автор

Hey Alex, I'm using a Linux Device running python 3.11 venv, when i try to run main.py i get the following error " no module name pyaudio. i go about using the simple command pip install pyaudio, however when running that command i get greeted with this error, "could not build wheels for py audio, which is required to install pyproject.toml-based projects, i was hoping you may be able to share some insight into why this may be happening. Great video btw, i await your speedy response :)

Threecommaaclub
Автор

Some operating system API exists for text to speech are free and can act instantly without having to transact information flows through the internet to some central system that might get bogged down with excess usage. I have noticed that if one becomes dependent upon something or someone, a monopoly situation may well result and you end up potentially having to pay pay pay for things that your local PC could have done for free on its own without the need of network data interactions. Often the distant server has a better sounding voice and it does not mispronounce as many words, but soon you shall be out sourcing too many things to outside entities where you become too dependent on them.

If a set of 10 words or so are known to be mispronounced by the local speech api in your PC is there a way to have your PC handle those exception words with specialized processing where a sylable at a time is custom handled per each of the 10 exception words to save you from having to use an api key that can be withdrawn from handy use by the flick of a switch by the third party provider?

oldspammer
Автор

This really interested me. I modified it a bit to add a listen button to the UI so it only listen when you select listen, this is easier than a “wake word”

Then I thought, integration. I use MacOS.

I build a folder called modules, added a second step that parse the text through GPT again to match a dictionary, and then GPT decide which function in the dictionary matched and ran it.

It worked great for checking calendar events etc, and if no matches were found it defaulted to gpt chat reponse but the extra layer added more latency and just isn’t scalable

edbayliss
Автор

🎯 Key Takeaways for quick navigation:

01:02 *🚀 Overview of Voice Virtual Assistant Development*
- Explanation of building a voice virtual assistant similar to Jarvis from Iron Man.
- Overview of the backend workflow involving voice input, transcription, response generation, and audio output.
- Introduction to third-party services like Deepgram, OpenAI, 11 Labs, and Taipy used in the development process.
03:21 *🔧 Installation Instructions for the Voice Virtual Assistant*
- Cloning the GitHub repository and installing necessary requirements.
- Setting up API keys for Deepgram, OpenAI, and 11 Labs.
- Creating an environment file to store API keys securely.
- Executing installation commands and waiting for requirements to install.
08:33 *🛠️ Running the Voice Virtual Assistant*
- Instructions for running the display interface (`display.py`) and the main script (`main.py`).
- Description of how the assistant listens, transcribes, generates responses, and displays conversations.
- Example interaction demonstrating the assistant's response to user input.
09:28 *💡 Customization and Modification of the Voice Virtual Assistant*
- Guidance on modifying the assistant for specific use cases.
- Suggestions for changing context, models, and voices for customization.
- Discussion of potential improvements, such as integrating news, adding memory, and overcoming latency limitations.

Made with HARPA AI

taylorsmith
Автор

Fantastic work and video, thank you!!

chrsl
Автор

LETS GOOO NEW ALEXANDRE SAJUS VIDEO I CLICK LIKE I SUBSCRIBEEE

painperdu
Автор

I need help, When I do pip install -requirements.txt it says there is no such directory even tho I see the file

nightmare
Автор

Sir i want to do like this sir is there any Free API available if not in OpenAI means, pls tell some other AI APIs to do ai tasks sir!

FantasyDark-ubxh
Автор

Incredible material! Thanks bro, you're tutorials are super helpful for those learning to code. I'm trying to follow along

Not sure if you've taken any subscriber requests. I've really wanted to find a tutorial on creating a machine learning model on python that can figure out its own strategy for successfully trading forex and integrating it with mql4 or 5.

Definitely possible but there's next to no tutorials on this anywhere i noticed

DalazG
Автор

Awesome video! Thanks for sharing, but I've got a question. How can I implement a pre-trained OpenAI assistant into Taipy?

crprp
Автор

aye this is so cool but there is no wake up keep and end key buh this the greatest and I know you know

olakunleogunseye
Автор

This is awesome! I am wondering though how much this project is costing you from API calls (if you were to use this daily and pretty often)? I'm planning to build a home assistant that can control all of my home gadgets and perform actions on my computer, but I'm trying to decide whether I should use all local models (whisper, coqui, and mistral) instead of the paid online services. The quality and speed is a bit lower locally, but it's free so I'm thinking about the tradeoff. Please let me know what you think, thanks!

PenguinjitsuX
Автор

What alternative can be used for elvenlabs

omjondhalefyco-
Автор

i love it thank u for sharing .. pls keep sharing wiith us ur magic

marouane