How To Install LLaMA 2 Locally + Full Test (13b Better Than 70b??)

preview_player
Показать описание
In this video, I'll show you how to install LLaMA 2 locally. We will install LLaMA 2 chat 13b fp16, but you can install ANY LLaMA 2 model after watching this video. I also put LLaMA 2 chat 13b fp16 through an extensive test. Does it do better than LLaMA 2 70b? Let's find out!

Enjoy :)

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? ✅

Rent a GPU (MassedCompute) 🚀
USE CODE "MatthewBerman" for 50% discount

My Links 🔗

Media/Sponsorship Inquiries 📈

Chapters:
0:00 - Intro
0:23 - Install Guide
3:40 - Testing LLaMA 2 13b fp16

Links:
Рекомендации по теме
Комментарии
Автор

Should I add these prompts to my LLM rubric going forward:

* Should I fight 100 duck-sized horses or 1 horse-sized duck? Explain your reasoning. (Fun)
* Describe a paradox in quantum physics in layman's terms. (Ability to explain in simple terms)
* A ball is put into a normal cup and placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? (Logic & Reasoning)

matthew_berman
Автор

this is the best video i have seen. I have spent 3 days trying to get this to work. I am not an experienced programmer and have no experience working in the industry. This is so good, so easy and the prompts he gives actually work how he says they would. Thank you so much man you have made me so happy damn

Tommarto
Автор

always love seeing how good the uncensored versions are!

TherangeCow
Автор

I really like the "duck-sized horses or horse-sized duck" prompt. It reveals a lot about how the model "thinks" and tests its ability to reason about multiple concepts simultaneously. For reference I gave GPT-4 the exact same prompt and gave me a pros/cons list of each choice, although it started with "Ah, the age-old question" so I wonder if it's been given a little extra targeted training on prompts like these.

Also, I'd love to see more uncensored models. I get why safety is made a priority by big companies like Meta and OpenAI, but it's clear that uncensored models are going to be the most useful.

DisturbedNeo
Автор

”It almost got 16” ?

The correct answer is 4 hours isn’t it? Each shirt takes four hours, regardless of how many other shirts you’re drying simultaneously. There’s enough sun to go around 😉

erikjansson
Автор

I would love to see a review of the uncensored versions of Llama 2. This looks really solid. Finally the open source community is catching up to closed models like ChatGPT. I hope by the end of the year they will be performing just as well if not better.

MakilHeru
Автор

Hey bud, you are killing it with the tutorials, keep on keeping on

ElderMillennialStuff
Автор

The poem was really great with all lines rhyming, I've never seen this before on a local LLM, even not when I asked for rhymes! It's also impressive that it solved the math question right out of the box, I had to give it a hint before it did and then somebody called me a liar in the comments! I'd like to see the new questions in the tests, maybe exchange them for the easier questions that bascially any not compeletly brain dead model can answer correctly. Anyway, good luck in your battle against the horse-sized duck (or duck sized horses) and let us know if you actually find a 65B LLama2 model that you mentioned a the end of the video. ;-)

testales
Автор

Please keep the duck-sized horse prompt. It's very entertaining!

greenockscatman
Автор

An interesting things about these various models (no matter the current size) is there conception of words or phrases. For example, the word pun or puns is considered to have a "word" relationship by the Airoboros model. It will reference certain words in an input prompt versus a phrase or figure of speech.

jeffwads
Автор

Been through this video multiple times, step-by-step. Running the checker script after installing everything, it comes up with the version, but says "False" for Torch being available though it's installed. Running the server script it says no gradio instealled, but that's installed, too, and verified. It provides the local server address anyway, and running that in my browser, I can access the platform, load models, change settings, etc, but there is no response from the model when asking questions, presumable because it's not finding Torch and gradio. Thoughts on how to resolve? How to get it to recognize Torch and gradio?

BeAsYouAre
Автор

thanks great video, Question what hardware are u running this model, need good video card? use a lot of cpu usage? need a lot of ram? thanks in advance.

jpsolares
Автор

Spectacular video. Extremely well produced, delivered and informational
I had a bunch of "yak shaving" to do getting the correct CUDA lib on my Ubuntu EC2 instance -- very(!) much out of scope for this video, and alas my 24 GB NVIDIA card was insufficient. Will have to bump my EC2 instance higher (or go to RunPod as you have suggested _numerous_ times!). Once I get there, I will try another iteration, but again -- really great video (as are all your tutorials)

danavirtual
Автор

New to this, but was wondering why people are not creating docker files for these models. would it not be easier to install and update or am i missing something like it would have problems with the gpu?

mailtbltom
Автор

My man! The 'Berman-ator' strikes again!!! Thanks for this video Matthew. You are awesome.

geno
Автор

Is there a way to test or know what sort of GPU and PC specs we would need to run fully local, any of these models? And what specs would I look at for that? For examples, I'd run a smaller or whatever model on my PC locally if possible vs a newer larger model.

Macrogasm
Автор

I managed to install the Llama 2 version you suggested. I had errors when loading the model in the browser so I had to increase VPU and GPU memory allocation. BUT... when I ak a question, the model takes forever to type something as mundane as "certainly!". my setup is a Lenovo Legion with 8Gb GPU and 32Gb Ram and a Ryzen CPU. Is there anything I need to tweak to increase the speed? thanks you!

jd
Автор

How do you fine tune the model with custom dataset? What I have are pdf documents. Thanks

abramswee
Автор

When tryign to download the model in Text gen Web UI I get an error: IndexError: string index out of range. Tried with differnt models. Always same error.

toromanow
Автор

In this model can a pdf document be uploaded, and chat, for questions and answers specific to the pdf?

rajesh_rachamalla