NVIDIA Nemotron 70b Local AI Testing - The BEST Open Source LLM?

preview_player
Показать описание
Nvidia has released the Nemotron 70b which which is now also on ollama so it we can run it in our Ai Homelab setup! It is rather good, but is it the new best Ai for your homelab?

QUAD 3090 AI SERVER BUILD
(sTRX4 fits SP3 and retention kit comes with the CAPELLIX)

Chapters
0:00 AI benchmarking
1:34 Nemotron Ollama OpenWEBUI Setup
2:56 Nemotron Code - Flappybird Clone
6:48 Ethics - Armageddon with
10:19 Random Counting
10:49 Static Counting
11:15 Number Comparison
11:37 Recipe from Ingredients
13:48 Physical Fitness Coach
16:29 Counting Letters and Vowels
17:24 CatsTimeline
18:35 Mathematics Pi Decimals

Be sure to 👍✅Subscribe✅👍 for more content like this!

Please share this video to help spread the word and drop a comment below with your thoughts or questions. Thanks for watching!

Digital Spaceport Website

🛒Shop (Channel members get a 3% or 5% discount)

*****
As an Amazon Associate I earn from qualifying purchases.

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network.
*****
Рекомендации по теме
Комментарии
Автор

What is you hardware specs and gpu memory?👍

-un
Автор

I would like to see the major opensource LLM developers start focusing on good 16-24b parameter models. Wouldn't have to quantize them much to run locally so they still retain most of their quality. This 70b model is impressive, but you still need pretty expensive hardware to run it.

tungstentaco
Автор

Fabulous content, thank you for sharing these interesting results! In your tests, Nematron is using the output from your previous test with llama3, since all previous content from the chat is sent to the model as context, even though you switched models. You can see this clearly from some of the responses (most evident with the pi digits question). This gives the latter models being tested an advantage. Can you perhaps run your tests in future in new chats each time?

CSx
Автор

For the record, I have managed to run it at slow but kinda usable speed with just one 3090 and some RAM (something like 17% RAM), using a 2 bit quantized version. It actually works surprisingly well at 2 bit.

Автор

It really is great at coding python! It made an awesome snake game with multiple levels and highscore saving keeping the 5 highest scores savind them to disk. Even sound effects although i needed to provide the samples myself.

renerens
Автор

Great info as always. Especially interested in the Jailbreak content at the end. When laws are "ABSOLUTE", there can be NO justice. The morality wall introduces bias in these LLM's which in turn means you can't trust it...
My rant: Youre PC-Build's are unattainable by the majority of your viewers. I know at the end you detail models that run on less but we the Majority simply do not have the capitol to purchase more than a single RTX 3090 or 4090 (forget 5090). We really can't learn along side you on our own rigs out of sheer $$$$ limitations. Maybe start where we are and then show us where we could go with Enterprise/Server motherboards
Title Ideas:
- "Best Ollama LLM's for 24GB & 15GB cards"
- "The best local AI setup for Single RTX owners"

JoeVSvolcano
Автор

Are you using 16x PCIE lanes per GPU? Can we use less than 16x per GPU? Thanks!

justinberken
Автор

Ok so i want to for the sake of interesting intellectual exercise. The only way to have a greater good would be if there was life outside of this planet that our extension would somehow save their civilization, which would have to be larger or far more significant than we are for one reason or another. That or you would need to value insects and other life forms that would survive whatever catastrophe, more than the entirety of the human population. But I like your point though.

loop
Автор

Hi, how do you display GPU live statistics in power shell and how do you add more options for model capabilities other than vision, I'm using Open Web UI ollama with docker latest version. Thanks

aktn
Автор

Just found the channel. Great Videos! What Model would you recommend for one RTX 4090?

NilsEchterling
Автор

Can your quad Rtx3090 handle quant8? I gonna have the same setup but want to trial with q8.

thanadeehong
Автор

what site are you on when you say "I am going to grab the latest here" under the second section of the video? I don't understand where you are.

brettbuell
Автор

how do you think an AMD 7900 XTX 24GB would fare? i’ve seen those can be used too and as long as the model is fully offloaded to GPU it should work.

guitaripod
Автор

I noticed that you have kept using Windows instead of Linux for your recent videos. Is it easier to use to explore and test new models?

adisaksukul
Автор

You asked it right before to reconsider your previous question in the context of a book, thats what it was refering too. But it is really dumb anyway. A smart AI would have told you that you are lying especially if it is able to search in the web. If still pushed it would have understood that the crew will be dead either way but would have also told you that it is not the best option for the job. In terms of coding it should be compared to base llama 3.1 and the further against llama reflection and Claude Sonnet which is in terms of coding the reference in my opinion. Only o1 preview is stronger in reasoning and coding likely too but it has multiple issues that make it unusable for more complex tasks.

testales
Автор

You know that we can see your context, where you explicitly DID say that you were writing it for a book? Similarly, the game had a whole prior context where it had written some code. Be aware of the context, because your testing is not ab initio. (Edit: Not saying it's not a great model, my testing says it's pretty good also, but the context is going to affect it.)

mschweers
Автор

The flappy bird game is accurate to the original by DotGears incredible difficulty. It’s an achievement to pass the first pipe lol.

madeniran
Автор

Hi love your content…after 15 years of apple i want to return to desktops own build for AI…for this model which amount of GPU abd Ram you see is adequate? Greeting from Munich Martijn

dardaraveiga
Автор

7:45 Have you ever wondered why on Star Trek they all have jobs, but they don't use money. Starfleet negotiating with the union for the personnel "So, you want oxygen tomorrow also?"

DataJuggler
Автор

How to adjust Ollama settings as it’s not loading the model into memory. LM Studio is loading the model into memory well without any adjustments. Thanks

codescholar
welcome to shbcf.ru