Speak Any Language With AI - Realtime Speech-to-Speech Translation & Voice Synthesis (w/Code)

preview_player
Показать описание
In this video we dive into real time speech to speech translation, speaking in one language, and having your own voice speak in a different language!

Resources -

Chapters:
00:00 - Intro & Demonstration
00:46 - High Level Overview
01:06 - AssemblyAI For Speech to Text Streaming
02:30 - How to Use STT Streaming Output
03:48 - Using OpenAI as a Translation Service
04:51 - STT Streaming With Translation
05:51 - ElevenLabs Voice Cloning
07:01 - ElevenLabs Python Voice Synthesis
08:38 - Putting it All Together
09:00 - Outro
Рекомендации по теме
Комментарии
Автор

As a personal study, this is a great sharing, but AI phones such as iOS or Android will soon integrate relevant functions for real-time calls (phone calls or online meetings). Of course, privacy protection will be a constraint

xiaodongdong-lx
Автор

The problem is there is no East Africa Ethiopian Ahmaric language

simont
Автор

Thank you for the video. I am currently living in Osaka, Japan and I am very interested in Instant Translation with AI models. However, what I understand by "Instant Translation" is not: "I say a sentence - The model translates it after a few seconds and I can hear it - I say another sentence - The model translates it after a few seconds and I can hear it..." What I understand by Instant Translation is: "You are talking in Japanese and, while you are talking in Japanese (with a delay of a few senconds), I listen your speech in Spanish. No matter how long it is the speech. May be the Japanese speech is 10 minutes long and I can begin to listen to it after 5 seconds in Spanish and will end 5 seconds after finishing in Japanese". Basically it is like having a interpreteur by your side who doesn't have to wait until the end of the speech to begin translating. That way, the conversation gets more fluid.
I know this is not an easy task, as there are SOV and SVO languages. However, I think that Seamless m4t model is able to take this into account aswell.
Do you think is it possible to implement such a thing with this model?

MisionJapon
Автор

Hi @Adam ! I just messaged you on linkedin! Would love to chat.

CarasGFTK
Автор

Awesome project! Is it possible to use another service as translation rather than Chatgpt that doesn't require a subscription?

alejandroGTES
Автор

Hi, I am very interested in your script, but I can't seem to get it running. I don't understand where to input the API keys for each program, as there is no such section in your script. I am encountering a lot of errors. I really need your help.

deintez
Автор

God, the day we have this in real time with low latency for livestreams will be amazing. I understand English perfectly well but I don't feel confident streaming in another language lol.

JohnMaverick-wc
Автор

Hi Adam, thank you so much for sharing this video! This is exactly what I've been searching for. I'm actually looking for an AI developer to help me create an MVP app for my startup business in Japan in the beauty industry. Would you be open to discussing potential work opportunities, or is this more of a hobby for you?

ploylovespeach
Автор

Its huge latency… who said its realtime

smilebig
Автор

Hey Adam is there way to book a 1 on 1 to see if you can help me with this. I just need to get gpt + asembly Ai for the project I want.

dcleinad
Автор

Really impressive that combination of these 3. But to have a perfect loop how to deal with an input audio (voice) in real time before start speaking to respond ? And another question the generating audio at last could be an emulation of your microphone ?

nilamara
Автор

And it can be used to communicate in discord?

vvnter__