Talk to AI with enhanced speech recognition | Gemini

Показать описание

Talk to AI Voice to Voice with Enhanced Speech Recognition on Gemini — Google’s newest and most capable AI model. Watch Google DeepMind Research Scientist Adrià Recasens Continente demonstrate Gemini’s abilities to understand audio in different languages, from multiple speakers and to combine vision, audio and text to offer a helping hand while cooking in the kitchen.

Рекомендации по теме

Комментарии

These Gemini videos are incredible. What a time to be alive

chrisg

Not losing nuances by not converting to text is a big improvement indeed. I guess theoretically one could still encode to tone in the text e.g. 'Hi there [sad]", but direct seems more nuanced.

tristanwegner

well I am not sure how to use automatic voice recognition for example when I say something it would automatically stop and then speak because in the website I can press listen and then it can speak but in here I hear automatic speaking first off a sound when pressing the microphone which I didn't hear in the website and a really bad thing is in here I hear a text to speech with a bit of emotions but in the website I hear the normal Google Text-to-Speech so the sound is different there I'm not sure if that's a problem of Gemini ultra and advanced but I really want to check that out just if it's one audio send and receive

psyraproductions

Is this currently available to developers via the Gemini Pro API?

If it is, how good is it at handling non-native English accents, particularly from South and East Asia?

sbh

We want the all features of Gemini's vision to be added to Bard 👁️

ilyass-alami

Where is the documentation to be able to use the speak with audio

areebmianoor

This is amazing! Can we please get an update for pixel phones as voice to text is very bad.

anf

They’ll completely works documents before sending 3:45

YusriCassim

they should add an audio decoder to get the audio generating capabilities we need.

danylaley

Great, now give it hands to do the actual cooking too.

ShpanMan

Are these capabilities native to Nano?

jaswan

No live demonstration? Means it's not reliable yet

y.

In coming we can see a plugin or inbuilt to listen any podcast in every language

jaswanthna

Hello Google, can you make a video on how to add Gemini to nocode pages like bubble?

JoseBonilla-ukgf

How can Gemini assist in refining and enhancing a monolingual Haitian dictionary that already contains 160, 000 words? How can I have access and training on it ?

tvtimoun