LangChain: Run Language Models Locally - Hugging Face Models

preview_player
Показать описание
I think video, I will show you how to use Hugging Face large language models locally using the LangChain platform. We will also explore how to use the Hugging Face Hub API for same models. We will explore Encoder Decoder as well as Decoder models (text2text generation, and text generation models). Come and explore the amazing world of large language models with us.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
LINKS:

-------------------------------------------------
-------------------------------------------------
All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

I was looking for a video that shows "how to use hugging face models locally" for a long time and finally find it thanks so much, bro

ramzan
Автор

Colab: Looks like the Colab links to SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Also, thank you for all the great content and resources, very much appreciated

myhificloud
Автор

Thank you so much for the well-structured video and accompanying google collab! Other YouTubers often assume the viewer is experienced, but you are patient enough to explain the basic terms and ideas.

polinalee
Автор

I've found it near impossible to find info on memory requirements for using any model. If I want to load in a model out of the box locally (for example the flan-t5 model in your video), how can you determine this given the parameter size of the model, assuming no quantization, full-fine-tuning, and inference? Also what is actually getting loaded into memory as soon as you load in the model?

pp
Автор

The problem here is you use the word locally which can be connected to the word off-line. If I can run something locally I would want to be able to run it off-line as well. Your solution here requires an online connection to that other service. Effectively you’ve moved one online moment to a different one. I’m only looking for off-line local chat like Oogabooga.

mygamecomputer
Автор

Great and informative video again! one bit to add is if you develop your chatbot, and doing vector search, decoder-encoder LLM models performs better, and when you generate human-like responses decode-only llm models are suitable more for that

bakistas
Автор

Thank you for providing and sharing a simple workflow, using self and cloud-hosted options. This is pure gold.

ShaunPrince
Автор

How to create and train your own model, based on the rules of a business, and use it as explained in the video? Excellent content! I thank you!

luizaugusto
Автор

when i run this step :
google collab can't execute this line, how long is it taking for you?

toukoum
Автор

Can we run the Vicuna model like this? The 4 bit model available on huggingface?

sauravmukherjeecom
Автор

and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?

botondvasvari
Автор

I have spent the last 3 days trying to learn all this through the langchain documentation. You made everything so much simple and clearer to understand. Thank you so much for your work! I unfortunately have failed multiple times to run StableLM 3b locally in google colab due to it crashing the session (RAM shortage). I've watched your other video about 8 bit quantization and have tried it, yet it still crashes the session. I've found useful articles about instantiating large models in huggingface but I can't quite understand what I'm reading. Any ideas on what I should try?

mirohernz
Автор

With which open source structures is an artificial intelligence created to run text to speech and speech to text in a call center style in an institution? 52 x 8gb rx570 graphics cards, which are currently idle as Ethereum rig, are considered to be used in this business? Which open source builds do you think would be appropriate? especially inbound calls for support are aimed. Or survey calls.

ressamendy
Автор

I've tried the first approach and over 4minutes of response, the api reported "out of time". I tried through virtual environments, docker python image installing the proper ROC for the AMD card, but no results :( I suppose it is the use of the AMD card and their incompatibilities with Pytorch

TheMacister
Автор

ERROR: Could not find a version that satisfies the requirement InstrcutorEmbedding (from versions: none)

MrGargmay
Автор

For the local version of this models it seems you're still using the hugging face Id. May you please explain how to download and what exactly do we need to download in order to run these locally without invoking external APIs?

AndyBarbosa
Автор

My OpenAI API key has expired. Does that mean I can't use Lang Chain to build apps?

ifeanyiidiaye
Автор

I want to query my own library of PDFs, without sending anything to OpenAI et al. Will you have a video for that soon? (please!)
There are lots of examples of loading own content which focus on 'prompt stuffing' which presumably does not scale well, whereas I have thousands of PDFs to 'load', so I really need a different solution. Your insights would be greatly appreciated, thank you!

beacon
Автор

Can you create video for download LLM from huggingface and Run models without api key and offline

narenkumar
Автор

Running the model using Google colab GPU is taking too much time which leads to connection timeout. Is it because of the free APIs?

vinaysamant