Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

preview_player
Показать описание
Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

👊 Become a member and get access to GitHub:

Get a FREE 45+ ChatGPT Prompts PDF here:
📧 Join the newsletter:

🌐 My website:

Faster-Whisperer:

ComfyUI:

ComfyUI-to-python:

I created a real time local speech to image system that generates images based on voice input in real time and displayes in images in a web flask app. You can try this by becoming a member of the channel soon!

00:00 Speech to Image Intro
00:28 Speech to Image Flowchart
01:16 Speech to Image Setup / Python Code
07:21 Joe Rogan Podcast Test
08:50 Anime Bedtime Story Test
10:33 Taylor Swift Music Video Test
11:42 Mr.Beast Video Test
Рекомендации по теме
Комментарии
Автор

well done! You're one of the few channels actually moving this forward with real examples and use cases.

adventurelens
Автор

Totally love it; I've been hacking together a realtime STT -> LLM + RAG system, pretty amazing that we can do so much with off-the-shelf stuff. The image generation is an interesting sort of curiosity, but I think we could get some real value if all the text was saved with timestamps to a database, then when certain phrases are detected, we could trigger an LLM to answer a question or even perform a task with something like CrewAI. So cool!! please keep making!

JonathanYankovich
Автор

Omg. This is great. Could easily take this and add some logic where a person could create blog articles simply by talking.

brando
Автор

Amazing as always man! Wonder what ideas will come to reality next...

kawsarahmad
Автор

That's awesome! So much you could do with this!!

music_anarchy
Автор

You are at the tip of the spear, thank you for sharing this.

ryanjames
Автор

I love you exploring with this kind of stuff.

RyanSmith-rbch
Автор

🎯 Key Takeaways for quick navigation:

00:00 🎙️ *Introduction to Speech to Image App*
- Demonstration of the speech to image app.
- Initial test with voice commands to generate images.
- Introduction to combining speech with YouTube audio.
02:15 🔄 *Components of Low Latency Speech to Image*
- Overview of the components involved in low-latency speech to image.
- Flowchart showing the microphone, Faster Whisper, Comfy UI Python extension, and Stable Diffusion model.
- Mention of the need for a separate tutorial for detailed setup.
03:41 🖱️ *Comfy UI and Python Extension*
- Introduction to Comfy UI for stable diffusion model workflow.
- The role of the Comfy UI Python extension in converting the workflow into Python code.
- The simplicity of setting up Comfy UI for desired workflows.
05:49 🎛️ *Setting Up Faster Whisper for Audio*
- Explanation of setting up Faster Whisper for audio transcription.
- Reference to a previous tutorial on configuring Faster Whisper.
- Availability of Faster Whisper on the community GitHub.
07:12 🐍 *Python Code Overview for Speech to Image App*
- Walkthrough of the Python code implementing the speech to image app.
- Explanation of functions and nodes in the code.
- Customization options for parameters like prompt length and image size.
09:22 🌐 *Selecting Stable Diffusion Model and Flask App*
- Choosing the stable diffusion model using CIT AI.
- Creating a Flask app to display the generated images in real-time.
- Brief overview of the back-end and front-end functionalities.
11:54 🎬 *Testing Different Use Cases*
- Testing the app with a YouTube video from The Joe Rogan podcast.
- Additional tests with a bedtime story, Taylor Swift music video, and a MrBeast video.
- Impressions and reactions to the results of each test.
13:05 🚀 *Conclusion and Future Development*
- Expressing enjoyment in building and testing the app.
- Plans for future development and improvements.
- Encouragement to become a member for access to the GitHub and further content.

Made with HARPA AI

-Evil-Genius-
Автор

Really great stuff. Hats off, mister...

gregas
Автор

these are golden guides. appreciating your content and considering become a member if i can afford it after the paycheck is smashed to survive.

keep em coming!

around
Автор

Application: This can replace sign language. This could be refined and used to communicate with the deaf

samuelsamuel
Автор

Subscribed!
All subjects are amazing!
Unfortunatelly not member for some obvious reasons,
please share some stuff for non members you are the best user of IA I saw on the net
in the mind I love, offline and open source tools.

My english is not so good, I have to watch again and again to catch the spirit of your videos,
Some of your experiences with transcription provide an approach to breaking down the language barrier,
and more generally, to universal communication.

Merci beaucoup pour vos démonstrations fascinantes !

FBHearty
Автор

People have been so terrified of AI taking over the world. For me, this is the most exciting and fun time in development history, since the dawn of the internet! AI has made everything so much more streamlined, time efficient and productive. What a great time!

Regular_Folk
Автор

Very cool! I would like to see a full tutorial, and review the code too.. How large were the model and sensor downloads?

patrickctaylor
Автор

this is amazing.. if this goes really well i would love to try this and even willing to pay for it.

nrixxking
Автор

In the cheap seats here, ie not a member, but I would love to see the full version of this, and I think it would go crazy viral and do your channel a great, great service by getting you tons of views... But, that's just my thought if you are to release the full version. : )

sirrobinofloxley
Автор

please make full tutorial and instructions on github members

rapidreplay
Автор

This rocks! Yes tutorial please. What level of membership to get access?? What spec HW to run this….Linux server?? Windows thx

musumo
Автор

this would be great for converting audio books into comics or movies
persistent characters would also be good
this is amazing please develop this more !!!

FSK
Автор

I have access to github but I don't see this repo

tonywhite