NVIDIA Nemotron 70b Local AI Testing - The BEST Open Source LLM?

Показать описание

Nvidia has released the Nemotron 70b which which is now also on ollama so it we can run it in our Ai Homelab setup! It is rather good, but is it the new best Ai for your homelab?

QUAD 3090 AI SERVER BUILD
(sTRX4 fits SP3 and retention kit comes with the CAPELLIX)

Chapters
0:00 AI benchmarking
1:34 Nemotron Ollama OpenWEBUI Setup
2:56 Nemotron Code - Flappybird Clone
6:48 Ethics - Armageddon with
10:19 Random Counting
10:49 Static Counting
11:15 Number Comparison
11:37 Recipe from Ingredients
13:48 Physical Fitness Coach
16:29 Counting Letters and Vowels
17:24 CatsTimeline
18:35 Mathematics Pi Decimals

Be sure to 👍✅Subscribe✅👍 for more content like this!

Please share this video to help spread the word and drop a comment below with your thoughts or questions. Thanks for watching!

Digital Spaceport Website

🛒Shop (Channel members get a 3% or 5% discount)

*****
As an Amazon Associate I earn from qualifying purchases.

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network.
*****

Рекомендации по теме

Комментарии

What is you hardware specs and gpu memory?👍

-un

I would like to see the major opensource LLM developers start focusing on good 16-24b parameter models. Wouldn't have to quantize them much to run locally so they still retain most of their quality. This 70b model is impressive, but you still need pretty expensive hardware to run it.

tungstentaco

Fabulous content, thank you for sharing these interesting results! In your tests, Nematron is using the output from your previous test with llama3, since all previous content from the chat is sent to the model as context, even though you switched models. You can see this clearly from some of the responses (most evident with the pi digits question). This gives the latter models being tested an advantage. Can you perhaps run your tests in future in new chats each time?

CSx

For the record, I have managed to run it at slow but kinda usable speed with just one 3090 and some RAM (something like 17% RAM), using a 2 bit quantized version. It actually works surprisingly well at 2 bit.

It really is great at coding python! It made an awesome snake game with multiple levels and highscore saving keeping the 5 highest scores savind them to disk. Even sound effects although i needed to provide the samples myself.

renerens

Great info as always. Especially interested in the Jailbreak content at the end. When laws are "ABSOLUTE", there can be NO justice. The morality wall introduces bias in these LLM's which in turn means you can't trust it...
My rant: Youre PC-Build's are unattainable by the majority of your viewers. I know at the end you detail models that run on less but we the Majority simply do not have the capitol to purchase more than a single RTX 3090 or 4090 (forget 5090). We really can't learn along side you on our own rigs out of sheer $$$$ limitations. Maybe start where we are and then show us where we could go with Enterprise/Server motherboards
Title Ideas:
- "Best Ollama LLM's for 24GB & 15GB cards"
- "The best local AI setup for Single RTX owners"

JoeVSvolcano

Are you using 16x PCIE lanes per GPU? Can we use less than 16x per GPU? Thanks!

justinberken

Ok so i want to for the sake of interesting intellectual exercise. The only way to have a greater good would be if there was life outside of this planet that our extension would somehow save their civilization, which would have to be larger or far more significant than we are for one reason or another. That or you would need to value insects and other life forms that would survive whatever catastrophe, more than the entirety of the human population. But I like your point though.

loop

Hi, how do you display GPU live statistics in power shell and how do you add more options for model capabilities other than vision, I'm using Open Web UI ollama with docker latest version. Thanks

aktn

Just found the channel. Great Videos! What Model would you recommend for one RTX 4090?

NilsEchterling

Can your quad Rtx3090 handle quant8? I gonna have the same setup but want to trial with q8.

thanadeehong

what site are you on when you say "I am going to grab the latest here" under the second section of the video? I don't understand where you are.

brettbuell

how do you think an AMD 7900 XTX 24GB would fare? i’ve seen those can be used too and as long as the model is fully offloaded to GPU it should work.

guitaripod

I noticed that you have kept using Windows instead of Linux for your recent videos. Is it easier to use to explore and test new models?

adisaksukul

You asked it right before to reconsider your previous question in the context of a book, thats what it was refering too. But it is really dumb anyway. A smart AI would have told you that you are lying especially if it is able to search in the web. If still pushed it would have understood that the crew will be dead either way but would have also told you that it is not the best option for the job. In terms of coding it should be compared to base llama 3.1 and the further against llama reflection and Claude Sonnet which is in terms of coding the reference in my opinion. Only o1 preview is stronger in reasoning and coding likely too but it has multiple issues that make it unusable for more complex tasks.

testales

You know that we can see your context, where you explicitly DID say that you were writing it for a book? Similarly, the game had a whole prior context where it had written some code. Be aware of the context, because your testing is not ab initio. (Edit: Not saying it's not a great model, my testing says it's pretty good also, but the context is going to affect it.)

mschweers

The flappy bird game is accurate to the original by DotGears incredible difficulty. It’s an achievement to pass the first pipe lol.

madeniran

Hi love your content…after 15 years of apple i want to return to desktops own build for AI…for this model which amount of GPU abd Ram you see is adequate? Greeting from Munich Martijn

dardaraveiga

7:45 Have you ever wondered why on Star Trek they all have jobs, but they don't use money. Starfleet negotiating with the union for the personnel "So, you want oxygen tomorrow also?"

DataJuggler

How to adjust Ollama settings as it’s not loading the model into memory. LM Studio is loading the model into memory well without any adjustments. Thanks

codescholar

NVIDIA Nemotron 70b Local AI Testing - The BEST Open Source LLM?

NVIDIA Nemotron 70b Local AI Testing - The BEST Open Source LLM?

I Tested NVIDIA Nemotron 70B and Found the BEST Open Source LLM

LLAMA 3.1 70b GPU Requirements (FP32, FP16, INT8 and INT4)

Llama-3.1-Nemotron-70B: NVIDIA’s Unstoppable New AI Model

Install NVIDIA Llama-3.1 Nemotron 70B Instruct HF Locally - Most Helpful AI Model

Nemotron-70B (Fully Tested) : This NEW AI Model is POOR MAN's O1 (Beats Claude-3.5 Sonnet in DR...

Nemotron 70b: The BEST Opensource LLM EVER! (Beats Sonnet 3.5 + GPT-4o)

nVidia Drops NEW 70B model that BEATS GPT-4o and Claude 3.5 Sonnet?

Nvidia Nemotron-70B Free API

M3 max 128GB for AI running Llama2 7b 13b and 70b

6 Best Consumer GPUs For Local LLMs and AI Software in Late 2024

New AI Model Crushes GPT-4o With Shocking Results

Nvidia's Nemotron 70B Best LLM EVER! (Beats GPT4O + Sonnet 3.5)

Cheap mini runs a 70B LLM 🤯

NVIDIA a créé une IA qui SURPASSE ChatGPT ?! (Nemotron 70b)

Nemotron 70b (Fully Tested): Best Open Source LLM Compare with Qwen2.5-14b Side by Side

Nvidia Llama 3.1 Nemotron Code Tested in Cline

Meta Llama 3.1 is Game Over for GPT 4o ❓

LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4

Data Scientist Reacts to Nvidia's Nemotron-70b

Run 70Bn Llama 3 Inference on a Single 4GB GPU

NVIDIA Nemotron-70B: New LLM beats GPT4 and Claude3.5, detailed review

Building an OpenAI o1 Clone with Nemotron

How To Run Llama 3.1: 8B, 70B, 405B Models Locally (Guide)