Quantized LLama2 GPTQ Model with Ooga Booga (284x faster than original?)

preview_player
Показать описание
Trying out TheBloke's GPTQ 7b Llama 2 model and comparing it with the original llama 2 7b model. In my 1 test, it was apparently about 284 time faster.

Voice created using Eleven Labs.
Рекомендации по теме
Комментарии
Автор

Another AI person losing his mind haha Welcome to the club! Thanks for your videos. Some of the most useful out there.
Your quick and concise format is great.
PS - Only the initiated would know the "any" key. By the time you know it your descent into madness will be complete.

javiermarti_author
Автор

Great format. Fast and to the point. Plus the funny voice. thanks

gil
Автор

Great video and awesome presentation! Is this quantized version available as an API for consumption in any of the cloud providers currently?

ramachandrang
Автор

Awesome, no bollocks or bits missed out! long shot but Any ideas on how to get llama code working with open interpreter?

fuzzyorangetv
Автор

Is there any coding LLMA that we can use, with an extension in place of GitHub Copilot for VSCode, under 12GB VRAM?

MaxPayne_in
Автор

Also, would you please make a quick video on how to train your own raw text data.

userrjlyjg
Автор

Can the web ui text generation AI model work in the background? For example, telling him to do an accurate search without being there to give input? Or tell him: write me a message in a precise hour from now?

SAVONASOTTERRANEASEGRETA
Автор

the AI voice sounds like he's yelling lmao

Legnog
Автор

The one click installers are not there for me

DemonClaJon
Автор

i use thebloke he has good stuff thanks for the update fix it actually worked this time thank you

the_one_and_carpool
Автор

You forget to clean the history. In the second test the model had to process the history.

MrAlsBundy