Updated Installation for Oobabooga Vicuna 13B And GGML! 4-Bit Quantization, CPU Near As Fast As GPU.

preview_player
Показать описание


With our previous video now out of date, we promised to update the instructions! We walk you through the complete process of setting up OogaBooga's one-click installer for their text-generation-webui on your machine. We'll guide you on installing the models locally, discuss minimum system requirements, and even explore how to set this up in CPU mode using GGML. Learn how to install the necessary dependencies, choose between NVIDIA, AMD, or CPU-only options, and get the best performance out of your setup. Don't forget to like, subscribe, and let us know what other topics you'd like us to cover!
Рекомендации по теме
Комментарии
Автор

Your AI tutorials are some of the best on YouTube. Before this video I could not get Vicuna to run well in Oobabooga. I would love to see your channel focus on the latest and most advanced open-source chatbots as well how to make the most of Oobabooga and other AI tools. Thanks!

justwhatever
Автор

That was a very useful comparison thank you. It’s saved me a couple hours of time.

logan
Автор

You sound like Pritchard from Deus Ex, haha. Nice guide.

yahifumeno
Автор

Hello, thanks for the video.
Thank you so much.

chicacryptoplanet
Автор

ERROR:No model is loaded! Select one in the Model tab.

ElCuartoRoj_
Автор

This made it sound much easier. I appreciate the time and effort you put into these instructive videos. I do have a question I would love to see addressed in the future. How much impact will AI have for the medical field? Will it make diagnosing and treating faster and more accurate and will it be able to devise new treatments, medications and diagnostic equipment?

kaymcneely
Автор

When you are running the gpu model, you can go to the model tab and select how many layers you want to run on gpu. The extra layers would run on cpu. You can download a 30b model, put 30 layers on gpu (for let's say a gpu with 12gb vram) and it will run the extra layers on cpu. It works, but the cpu usage is really low, like 10% utilization. It makes the model really slow, what doesn't happen you you run cpu only. Would someone knows how to configure it to use more cpu threads? With that, you could run a 30b model, with good performance, splitting between gpu and cpu. You would be using the memory of you gpu and ram at the same time and run bigger models with the best possible performance.

cparoli
Автор

Thanks for the straightforward video. I tried this on a 6GB RAM 3060 GPU and it was only able to operate at 2 tokens per second. Can you clarify the hardware you are using?

lethalburns
Автор

I get some error lines which read: 'Llama' object has no attribute 'ctx'. I get this error when I try to load vicuna-13b-4bit in text-generative-webui though this model works fine with llama.cpp. What could be the solution?

valdesguefa
Автор

For the life of me I cannot figure this installer out. After following your video the launcher decides to open the webui after installing things needed for gpu but before giving me the option to install a model. Some errors pop up about building a wheel and llama? I am left with a running site but no way to load a model. Sort of new to this so any help would be appreciated

gwrjubd
Автор

Also, Can you explain the prompt templates more, My can I am using generation_text_webui's API extension. Should I add "some system level description or roles" and "### Human:" before my main prompt? Then it will reply with "### Assistant:" tag for their saying?

heejuneAhn
Автор

Hello i followed the steps but for some reason when ever i send a prompt it does not answer with anything and removes my prompt. any idea how to fix it?

abdullahkratos
Автор

HI. Is there a way to speed up the conversation as there is often a long wait for responses?

SAVONASOTTERRANEASEGRETA
Автор

Not sure if this is a common thing but.. when i try to start it, it successfully downloads everything but then it doesn't give me the option to download any model. It just instantly goes to the api key.. Do you know why?

iame
Автор

when i use the installer it doesn't let me download any model and says that i don't have quant-cuda

ld
Автор

Not sure if you'll see this but thanks for the post. Got me up and running... but after a couple days I decided to hit the update file to see if there was any new goodness and it just broke my little AI. Now whenever I ask a question it plays back what looks like training data with replies that list human / assistant interactions. Any ideas? I feel like something got tweaked in the update but this is all just barely t the edge of my ability to understand.

fidobarks
Автор

I know this is probably a very annoying and stupid question, would you be able to help me with the API, for example if I have another python program that I can use to feed vicuna text and then have it send responses back which I can retrieve on my python app

McVerdict
Автор

Do ggml models not need tokenizers or anything added to the folder that contains them?

MrArrmageddon
Автор

Great explanation of things, thank you. Unfortunately, I can't run it properly, as after installation, when I run start_windows .bat file, I recieve "ERROR:Failed to load GPTQ-for-LLaMa" message. Any ideas to solve this?

YAH
Автор

Any update on Oobabooga's AMD GPU support on Windows? I heard that ROCm is on Windows now!

ave-