nVidia Drops NEW 70B model that BEATS GPT-4o and Claude 3.5 Sonnet?

preview_player
Показать описание
NVIDIA's fine-tuned Llama 3.1 70B (Nemotron) outperforms Claude 3.5 Sonnet.

While we don't have Opus 3.5 yet 😂, this is a step forward especially because it's open-source!

Рекомендации по теме
Комментарии
Автор

solid video, thanks for a breath of fresh air in this space.

AxisSage
Автор

Good model, but where did NVIDIA get those GPUs to fine-tune LLAMA? They are really hard to find these days...

gileneusz
Автор

So after playing with it for a few days I’m wondering if you think it could be an anchor model (the large llm) for many SLMs in an an Agentic system? Love your thoughts given how many of these models you’ve touched! :)

Also curious what is LLMBuilds?

Cheers,

Christopher

Christopher-today
Автор

I actually think a Llama 3.1 8B nemotron reward model would be more interesting than a 70B; models seem to be a lot better at evaluating outputs (including of models significantly better than them) so you could imagine pairing an 8B reward model with, for instance, an unaligned (and therefore creative) base model of anywhere between 20B and 70B for something like a tree search.

novantha
Автор

it'll be crazy if Nvidia dominates A.I. hardware and A.I. software

ytubeanon
Автор

Interesting video. Are you aware of any architectures or research initiatives that apply some sort of temporal LOD (level-of-detail) where the context for the LLM is continually refactored to summarize salient points from conversation history. In the case of LLM for code-writing, an LOD refactored context for that has high detail for the class or function that is being written but also a summarized context for other objects in the call-stack and a coarser summary for the module that the code is part of and at the most coarse detail level, an overview of the system as-a-whole. Ideally this would also be augmented with a brief outline of the history of the code or module, and a summary of issues/tickets that are driving development, with most recent issues having a higher detail explanation in the context.

I know that Aider use a repo map to summarize the software being worked on and to give the LLM more context.

I'm hoping to find something that dynamically tweaks the context to give the LLM a better shot at understanding the problem in a way that's closer to how a human would view it. I'm sure this must something that people are trying but I haven't come across it yet.

vrc
Автор

What's that clip of the tree from at ~3:05?

duke
Автор

In my usefulness tests, this model is downright creepy. Feels more like writing to a person than having an LLM process data. I very rarely test 70B models because they're slow on my PC, but do they normally understand / follow instructions this well?

jonmichaelgalindo
Автор

What app are you used before 0:58? I need to know!

tobywoolsey
Автор

Running on my Intel i9 CPU with 64GB RAM, this model runs at about typing speed / 60wpm.

jonmichaelgalindo
Автор

Craziest thing is it can run on Cerebras by teory.. not sure how much they had to modify llama to run on their chip but having close to 1o mode running 2000tokens/s what a beast ... Not I don't have Cerebras chip have only cluster of mined burn out 3090s available and I'm unable to build more than 6gpu stable node .. so still not enought for this model

apoage
Автор

No, it has not defeated gpt-4o or claude 3.5 sonnet. It scores much lower on MMLU-PRO, which is one of the main benchmarks to test how smart a model is

divandrey-uq
Автор

Can you run it on a 3090 system with ssd and 64gb ram?

DigitallyIntegrated
Автор

the strawberry "r" test is plain stupid !

Harvid.
Автор

Someone said it's Uncensored & if we get Mixture of Experts 😮

holdthetruthhostage
Автор

Nemotron does seem better than the original llama 3.1 70b, but, I don't think I'd call it better than Sonnet, and Sonnet still has very good vision capabilities, while this Nemotron has no vision, I think this is just another case of ambitious claims by proud AI researchers.

yesyes-ompo
Автор

Now i just need a 128gb m4max macbook pro 16inch to be able to run these 70B+ size models on the GPU purely.

Time to spend some money on the best laptop ever made: Macbook pro 16inch

PKperformanceEU
Автор

I just tested it and it failed the Strawberry question.

ojikutu
Автор

This cutting-edge model can be seamlessly executed on a Mac Studio M1 Ultra, equipped with 64GB of RAM, utilizing Q4_K_M quantization. Notably, it operates within a modest 40GB of RAM, comfortably fitting within the GPU's addressable memory footprint. Consequently, there is no necessity for investing in highly specialized and costly AI hardware. On my system, the model achieves an impressive throughput of 10.5 tokens per second, delivering exceptionally satisfactory results. Furthermore, employing a well-crafted system prompt serves to enhance its already outstanding performance even more substantially."

soerengebbert