Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

Показать описание

Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

👊 Become a member and get access to GitHub:

Get a FREE 45+ ChatGPT Prompts PDF here:
📧 Join the newsletter:

🌐 My website:

Faster-Whisperer:

ComfyUI:

ComfyUI-to-python:

I created a real time local speech to image system that generates images based on voice input in real time and displayes in images in a web flask app. You can try this by becoming a member of the channel soon!

00:00 Speech to Image Intro
00:28 Speech to Image Flowchart
01:16 Speech to Image Setup / Python Code
07:21 Joe Rogan Podcast Test
08:50 Anime Bedtime Story Test
10:33 Taylor Swift Music Video Test
11:42 Mr.Beast Video Test

Рекомендации по теме

Комментарии

well done! You're one of the few channels actually moving this forward with real examples and use cases.

adventurelens

Totally love it; I've been hacking together a realtime STT -> LLM + RAG system, pretty amazing that we can do so much with off-the-shelf stuff. The image generation is an interesting sort of curiosity, but I think we could get some real value if all the text was saved with timestamps to a database, then when certain phrases are detected, we could trigger an LLM to answer a question or even perform a task with something like CrewAI. So cool!! please keep making!

JonathanYankovich

Omg. This is great. Could easily take this and add some logic where a person could create blog articles simply by talking.

brando

Amazing as always man! Wonder what ideas will come to reality next...

kawsarahmad

That's awesome! So much you could do with this!!

music_anarchy

You are at the tip of the spear, thank you for sharing this.

ryanjames

I love you exploring with this kind of stuff.

RyanSmith-rbch

🎯 Key Takeaways for quick navigation:

00:00 🎙️ *Introduction to Speech to Image App*
- Demonstration of the speech to image app.
- Initial test with voice commands to generate images.
- Introduction to combining speech with YouTube audio.
02:15 🔄 *Components of Low Latency Speech to Image*
- Overview of the components involved in low-latency speech to image.
- Flowchart showing the microphone, Faster Whisper, Comfy UI Python extension, and Stable Diffusion model.
- Mention of the need for a separate tutorial for detailed setup.
03:41 🖱️ *Comfy UI and Python Extension*
- Introduction to Comfy UI for stable diffusion model workflow.
- The role of the Comfy UI Python extension in converting the workflow into Python code.
- The simplicity of setting up Comfy UI for desired workflows.
05:49 🎛️ *Setting Up Faster Whisper for Audio*
- Explanation of setting up Faster Whisper for audio transcription.
- Reference to a previous tutorial on configuring Faster Whisper.
- Availability of Faster Whisper on the community GitHub.
07:12 🐍 *Python Code Overview for Speech to Image App*
- Walkthrough of the Python code implementing the speech to image app.
- Explanation of functions and nodes in the code.
- Customization options for parameters like prompt length and image size.
09:22 🌐 *Selecting Stable Diffusion Model and Flask App*
- Choosing the stable diffusion model using CIT AI.
- Creating a Flask app to display the generated images in real-time.
- Brief overview of the back-end and front-end functionalities.
11:54 🎬 *Testing Different Use Cases*
- Testing the app with a YouTube video from The Joe Rogan podcast.
- Additional tests with a bedtime story, Taylor Swift music video, and a MrBeast video.
- Impressions and reactions to the results of each test.
13:05 🚀 *Conclusion and Future Development*
- Expressing enjoyment in building and testing the app.
- Plans for future development and improvements.
- Encouragement to become a member for access to the GitHub and further content.

Made with HARPA AI

-Evil-Genius-

Really great stuff. Hats off, mister...

gregas

these are golden guides. appreciating your content and considering become a member if i can afford it after the paycheck is smashed to survive.

keep em coming!

around

Application: This can replace sign language. This could be refined and used to communicate with the deaf

samuelsamuel

Subscribed!
All subjects are amazing!
Unfortunatelly not member for some obvious reasons,
please share some stuff for non members you are the best user of IA I saw on the net
in the mind I love, offline and open source tools.

My english is not so good, I have to watch again and again to catch the spirit of your videos,
Some of your experiences with transcription provide an approach to breaking down the language barrier,
and more generally, to universal communication.

Merci beaucoup pour vos démonstrations fascinantes !

FBHearty

People have been so terrified of AI taking over the world. For me, this is the most exciting and fun time in development history, since the dawn of the internet! AI has made everything so much more streamlined, time efficient and productive. What a great time!

Regular_Folk

Very cool! I would like to see a full tutorial, and review the code too.. How large were the model and sensor downloads?

patrickctaylor

this is amazing.. if this goes really well i would love to try this and even willing to pay for it.

nrixxking

In the cheap seats here, ie not a member, but I would love to see the full version of this, and I think it would go crazy viral and do your channel a great, great service by getting you tons of views... But, that's just my thought if you are to release the full version. : )

sirrobinofloxley

please make full tutorial and instructions on github members

rapidreplay

This rocks! Yes tutorial please. What level of membership to get access?? What spec HW to run this….Linux server?? Windows thx

musumo

this would be great for converting audio books into comics or movies
persistent characters would also be good
this is amazing please develop this more !!!

FSK

I have access to github but I don't see this repo

tonywhite

Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

I Was FLOORED. Realtime AI Translation & Voice Cloning!

I Built a Personal Speech Recognition System for my AI Assistant

100% Local AI Speech to Speech with RAG - Low Latency | Mistral 7B, Faster Whisper ++

Low latency AI voice talk in 60 lines of code using faster_whisper and elevenlabs input streaming.

Transcribe and Translate in Real Time NO INTERNET REQUIRED!

World’s Fastest Talking AI: Deepgram + Groq

Romance Scams: Falling in Love With a Fraud

Best FREE Speech to Text AI - Whisper AI

AI Enhanced Audio is INSANE

TEXT TO SPEECH | Piper TTS on Windows 🚀 AI voice 10x faster Realtime!

RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE!

Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]

Realtime AI Voice Changer Using RVC (Retrieval-based Voice Conversion w./ w-okada)

CLONE ANY AI Voices for FREE LOCALLY in 1 CLICK! JUST INSANE!

Updated AI Voice Cloning with RVC Inference - Tortoise with RVC Local Installation

ElevenLabs Alternative - Text To Speech AI free (XTTS2 Local Voice Cloning)

The Top 10 Best AI Voice Generators 2024

How to Install & Use Whisper AI Voice to Text

Bark: FREE Opensource Text-To-Speech Ai Tool - Realistic Humanlike Voices

Build A Talking AI with LLAMA 3 (Python tutorial)

FREE AI Voice Tool: Text-to-Speech (TTS) & Voice Cloning - MetaVoice

Build an AI Voice Assistant App using Multimodal LLM 'Llava' and Whisper