Llama 2 in LangChain — FIRST Open Source Conversational Agent!

preview_player
Показать описание
Llama 2 is the best-performing open-source Large Language Model (LLM) to date. In this video, we discover how to use the 70B parameter model fine-tuned for chat (Llama 2 70B Chat) using Hugging Face transformers and LangChain. We will see how to apply Llama 2 as a conversational agent within LangChain.

📌 Code Notebook

🌲 Subscribe for Latest Articles and Videos:

👋🏼 AI Consulting:

👾 Discord:

00:00 Llama 2 Model
02:55 Getting Access to Llama 2
06:12 Initializing Llama 2 70B with Hugging Face
08:17 Quantization and GPU Memory Requirements
11:14 Loading Llama 2
13:05 Stopping Criteria
15:17 Initializing Text Generation Pipeline
16:25 Loading Llama 2 in LangChain
17:08 Creating Llama 2 Conversational Agent
19:46 Prompt Engineering with Llama 2 Chat
22:16 Llama 2 Conversational Agent
24:14 Future of Open Source LLMs

#artificialintelligence #nlp #opensource #huggingface #langchain
Рекомендации по теме
Комментарии
Автор

Amazing work! I like how you breakdown all the nuances, including memory usage, great job James!

CMAZZONI
Автор

Playing with the 13B-Chat version and I've found with careful prompting it reliably outputs useful JSON. I haven't had time to stress test it, but I'm super impressed compared to all the other models I've tried. Nothing else has come close to showing this much usefulness out of the box.

jolieriskin
Автор

I tried and it works with 13B model too which works with free Colab. The prompt engineering is great!

LeoAr
Автор

I love how informative you are with the memory utilization and various parameters. The engineering trade offs are great to have your perspective on.

goobtron
Автор

James, your content is fantastic! I'd love to see a video implementing FlashAttention2 (with llama or other) to have a larger usable context window!

tfgidhr
Автор

Cheers, James. Nice vid. I’ve really been enjoying this model recently. The future is looking exciting !

bigpickles
Автор

Half way through the video I subscribed to this challenge, I love the way you simplified the details.

BestowTechs
Автор

You sir, are a life saver.
I almost gave up on llms because i couldn't find a single coherent tutorial about interfacing llm with external environment, which wasn't a marketing bs.
And then YouTube gods put this in my recommendations.

This is amazing. I was trying to figure out how to do this from scratch for like a week straight, and i managed to teach my "character" to say certain tags like [[time]], instead of replying with arbitrary made up date/time. I got stuck at that point, thinking there is something special about how commercial AIs do it, but you have just confirmed my intuition, and restored my hopes :)

Correct me if i got it wrong, but it seems langchain is basically just a library with a json interface, vaguely related to AI, and the actual "connection" between the model and the langchain is done by teaching the model to "speak" in json, and intercepting/redirecting the output ?

staviq
Автор

Awesome, thanks. Great explanation too.

Sulayman.
Автор

Thanks man - I was considering trying out Llama for an agent use-case, and you pushed me over the edge - cheers!

traviskassab
Автор

Rock solid goodness right there! James thanks for your time to spread the knowledge.

creatorsgear
Автор

Thanks James. Deeply appreciate your tutorials! keep them coming.

gkennedy_aiforsocialbenefit
Автор

First thing I try is a simple coding task, finding duplicate files (by their content) under a directory. So far only GPT-4 can complete this task. GPT-3.5, Claude 2 and WizardLM-WizardCoder-1.0-GGML (8 bit) comes close. All the other LLMs produce useless output. Not even Bard can do it. It looks like the only open-source model which can do coding and technical reasoning is still WizardCoder. I'm waiting for the next WizardCoder version to come out or for a Llama 2 fine-tuned for coding and technical reasoning tasks. That would be a real good open-source LLM, finally usable of offline work.

ViktorFerenczi
Автор

Thanks! It was very informative and on point.

spongebobsquarepants
Автор

Thanks for explaining the quantization, cos I understand the logic, but this was well put in code with example ^^

TeamUpWithAI
Автор

Would be great if you could make a video about hosting Llama 2 13b and 70b for production. It's easy to use the free inference api from HF, but there are really little resources out there speaking about the actual costs to run this, the trade-offs of using different VM of different specs (e.g. speed in tokens per second generated) etc... Great videos James, thank you!

dylanramirez
Автор

nice work, great example for me to create some application! thanks so much.

Bamboo_gong
Автор

I found this video really compelling. I believe it would be incredibly fascinating to leverage a CSV connection to answer data-specific questions. It reminds me of the article I read titled 'Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit.

simonmoyajimenez
Автор

Just subscribed. I enjoyed the content a lot - I liked that you clearly understand where people will have questions and clearly answer those questions.

scottmiller
Автор

Love your content James. I used MPT 30B for RetrievalQ&A. I just have one question how did you get the “Explain Text” field when you select certain text ?? seems very convenient and resourceful

kunal