Talk to AI with enhanced speech recognition | Gemini

preview_player
Показать описание
Talk to AI Voice to Voice with Enhanced Speech Recognition on Gemini — Google’s newest and most capable AI model. Watch Google DeepMind Research Scientist Adrià Recasens Continente demonstrate Gemini’s abilities to understand audio in different languages, from multiple speakers and to combine vision, audio and text to offer a helping hand while cooking in the kitchen.

Рекомендации по теме
Комментарии
Автор

These Gemini videos are incredible. What a time to be alive

chrisg
Автор

Not losing nuances by not converting to text is a big improvement indeed. I guess theoretically one could still encode to tone in the text e.g. 'Hi there [sad]", but direct seems more nuanced.

tristanwegner
Автор

well I am not sure how to use automatic voice recognition for example when I say something it would automatically stop and then speak because in the website I can press listen and then it can speak but in here I hear automatic speaking first off a sound when pressing the microphone which I didn't hear in the website and a really bad thing is in here I hear a text to speech with a bit of emotions but in the website I hear the normal Google Text-to-Speech so the sound is different there I'm not sure if that's a problem of Gemini ultra and advanced but I really want to check that out just if it's one audio send and receive

psyraproductions
Автор

Is this currently available to developers via the Gemini Pro API?

If it is, how good is it at handling non-native English accents, particularly from South and East Asia?

sbh
Автор

We want the all features of Gemini's vision to be added to Bard 👁️

ilyass-alami
Автор

Where is the documentation to be able to use the speak with audio

areebmianoor
Автор

This is amazing! Can we please get an update for pixel phones as voice to text is very bad.

anf
Автор

They’ll completely works documents before sending 3:45

YusriCassim
Автор

they should add an audio decoder to get the audio generating capabilities we need.

danylaley
Автор

Great, now give it hands to do the actual cooking too.

ShpanMan
Автор

Are these capabilities native to Nano?

jaswan
Автор

No live demonstration? Means it's not reliable yet

y.
Автор

In coming we can see a plugin or inbuilt to listen any podcast in every language

jaswanthna
Автор

Hello Google, can you make a video on how to add Gemini to nocode pages like bubble?

JoseBonilla-ukgf
Автор

How can Gemini assist in refining and enhancing a monolingual Haitian dictionary that already contains 160, 000 words? How can I have access and training on it ?

tvtimoun
Автор

Is there any way to try this? Because I can't find any API or even Playground for this...

WhoamI-tldi
Автор

When will multimodal input to speech be added to the Android Gemini API?

George-nxzu
Автор

Is Gemini capable of understanding "any" raw sound (like the sound of the Rain for example) or just speech?

LucaCrisciOfficial
Автор

Real time translation is $1 trillion company think about it guys.

kaio
Автор

Adrià, do you think Gemini will understand catalan? As google assistant don't. 😉

NikitaRemez