How To Install LLaMA 2 Locally + Full Test (13b Better Than 70b??)

Показать описание

In this video, I'll show you how to install LLaMA 2 locally. We will install LLaMA 2 chat 13b fp16, but you can install ANY LLaMA 2 model after watching this video. I also put LLaMA 2 chat 13b fp16 through an extensive test. Does it do better than LLaMA 2 70b? Let's find out!

Enjoy :)

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? ✅

Rent a GPU (MassedCompute) 🚀
USE CODE "MatthewBerman" for 50% discount

My Links 🔗

Media/Sponsorship Inquiries 📈

Chapters:
0:00 - Intro
0:23 - Install Guide
3:40 - Testing LLaMA 2 13b fp16

Links:

Рекомендации по теме

Комментарии

Should I add these prompts to my LLM rubric going forward:

* Should I fight 100 duck-sized horses or 1 horse-sized duck? Explain your reasoning. (Fun)
* Describe a paradox in quantum physics in layman's terms. (Ability to explain in simple terms)
* A ball is put into a normal cup and placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? (Logic & Reasoning)

matthew_berman

this is the best video i have seen. I have spent 3 days trying to get this to work. I am not an experienced programmer and have no experience working in the industry. This is so good, so easy and the prompts he gives actually work how he says they would. Thank you so much man you have made me so happy damn

Tommarto

always love seeing how good the uncensored versions are!

TherangeCow

I really like the "duck-sized horses or horse-sized duck" prompt. It reveals a lot about how the model "thinks" and tests its ability to reason about multiple concepts simultaneously. For reference I gave GPT-4 the exact same prompt and gave me a pros/cons list of each choice, although it started with "Ah, the age-old question" so I wonder if it's been given a little extra targeted training on prompts like these.

Also, I'd love to see more uncensored models. I get why safety is made a priority by big companies like Meta and OpenAI, but it's clear that uncensored models are going to be the most useful.

DisturbedNeo

”It almost got 16” ?

The correct answer is 4 hours isn’t it? Each shirt takes four hours, regardless of how many other shirts you’re drying simultaneously. There’s enough sun to go around 😉

erikjansson

I would love to see a review of the uncensored versions of Llama 2. This looks really solid. Finally the open source community is catching up to closed models like ChatGPT. I hope by the end of the year they will be performing just as well if not better.

MakilHeru

Hey bud, you are killing it with the tutorials, keep on keeping on

ElderMillennialStuff

The poem was really great with all lines rhyming, I've never seen this before on a local LLM, even not when I asked for rhymes! It's also impressive that it solved the math question right out of the box, I had to give it a hint before it did and then somebody called me a liar in the comments! I'd like to see the new questions in the tests, maybe exchange them for the easier questions that bascially any not compeletly brain dead model can answer correctly. Anyway, good luck in your battle against the horse-sized duck (or duck sized horses) and let us know if you actually find a 65B LLama2 model that you mentioned a the end of the video. ;-)

testales

Please keep the duck-sized horse prompt. It's very entertaining!

greenockscatman

An interesting things about these various models (no matter the current size) is there conception of words or phrases. For example, the word pun or puns is considered to have a "word" relationship by the Airoboros model. It will reference certain words in an input prompt versus a phrase or figure of speech.

jeffwads

Been through this video multiple times, step-by-step. Running the checker script after installing everything, it comes up with the version, but says "False" for Torch being available though it's installed. Running the server script it says no gradio instealled, but that's installed, too, and verified. It provides the local server address anyway, and running that in my browser, I can access the platform, load models, change settings, etc, but there is no response from the model when asking questions, presumable because it's not finding Torch and gradio. Thoughts on how to resolve? How to get it to recognize Torch and gradio?

BeAsYouAre

thanks great video, Question what hardware are u running this model, need good video card? use a lot of cpu usage? need a lot of ram? thanks in advance.

jpsolares

Spectacular video. Extremely well produced, delivered and informational
I had a bunch of "yak shaving" to do getting the correct CUDA lib on my Ubuntu EC2 instance -- very(!) much out of scope for this video, and alas my 24 GB NVIDIA card was insufficient. Will have to bump my EC2 instance higher (or go to RunPod as you have suggested _numerous_ times!). Once I get there, I will try another iteration, but again -- really great video (as are all your tutorials)

danavirtual

New to this, but was wondering why people are not creating docker files for these models. would it not be easier to install and update or am i missing something like it would have problems with the gpu?

mailtbltom

My man! The 'Berman-ator' strikes again!!! Thanks for this video Matthew. You are awesome.

geno

Is there a way to test or know what sort of GPU and PC specs we would need to run fully local, any of these models? And what specs would I look at for that? For examples, I'd run a smaller or whatever model on my PC locally if possible vs a newer larger model.

Macrogasm

I managed to install the Llama 2 version you suggested. I had errors when loading the model in the browser so I had to increase VPU and GPU memory allocation. BUT... when I ak a question, the model takes forever to type something as mundane as "certainly!". my setup is a Lenovo Legion with 8Gb GPU and 32Gb Ram and a Ryzen CPU. Is there anything I need to tweak to increase the speed? thanks you!

jd

How do you fine tune the model with custom dataset? What I have are pdf documents. Thanks

abramswee

When tryign to download the model in Text gen Web UI I get an error: IndexError: string index out of range. Tried with differnt models. Always same error.

toromanow

In this model can a pdf document be uploaded, and chat, for questions and answers specific to the pdf?

rajesh_rachamalla

How To Install LLaMA 2 Locally + Full Test (13b Better Than 70b??)

How To Install LLaMA 2 Locally + Full Test (13b Better Than 70b??)

How To Install Llama 2 Locally and On Cloud - 7B, 13B, & 70B Models!

How to Install and test LLaMA 2 Locally [2023]

How-To Download Llama 2 Models Locally

Step-by-step guide on how to setup and run Llama-2 model locally

How to use the Llama 2 LLM in Python

Step-by-Step Guide: Installing and Using Llama 2 Locally

Run Llama 2 on local machine | step by step guide

LLaMA2 Local install on MacBook

Installing Llama 2 on Windows Using oobabooga Web UI

Getting to Know Llama 2: Everything You Need to Start Building

How To Install CodeLlama 70B Locally For FREE! (EASY)

Ollama-Run large language models Locally-Run Llama 2, Code Llama, and other models

How To Install Code Llama Locally - 7B, 13B, & 34B Models! (LLAMA 2's NEW Coding LLM)

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Run Llama-2 Locally without GPU | Llama 2 Install on Local Machine | How to Use Llama 2 Tutorial

Install and Run Llama 2 in Amazon SageMaker

Run Your Own LLM Locally: LLaMa, Mistral & More

How To Use llama 2| llama 2 Chatbot

How to build a Llama 2 chatbot

Ollama: The Easiest Way to Run Uncensored Llama 2 on a Mac

FINALLY! Open-Source 'LLaMA Code' Coding Assistant (Tutorial)

Mistral 7B: The BEST Tiny Model EVER! Beats LLAMA 2 (Installation Tutorial)

Exploring Meta Llama : A Guide to setup on local machine