DeepSeek-V2.5 : The Best Opensource Model GOT BETTER! (Beats Claude, GPT-4O?)

preview_player
Показать описание
Join this channel to get access to perks:

In this video, I'll be fully testing the New DeepSeek-V2.5 to check if it's really good. I'll also be trying to find out if it can really beat Llama-3.1, Claude 3.5 Sonnet, GPT-4O & Qwen-2 in general & coding tests. DeepSeek-V2.5 model is fully opensource and can be used for FREE. DeepSeek-V2.5 is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔍 DeepSeek V2.5 released: The latest DeepSeek model blends coding and general use, combining both DeepSeek Coder and DeepSeek General models.

🧠 Powerful AI model: DeepSeek V2.5 excels in natural language processing and coding tasks with enhanced instruction-following and writing capabilities.

💡 Impressive benchmarks: It scores higher than previous DeepSeek models in multiple AI benchmarks, making it a top choice for both developers and general users.

💻 Open-source access: You can explore DeepSeek V2.5’s open weights on HuggingFace or Ollama, and even try it for free on the DeepSeek Chat platform.

⚙️ Advanced architecture: With 236 billion parameters and 21 billion active parameters, this model delivers high performance for various AI tasks.

💸 Affordable AI: DeepSeek V2.5 offers high-quality AI model performance at a low cost—just 30 cents per million tokens—making it ideal for those on a budget.

🎯 All-in-one AI solution: DeepSeek V2.5 eliminates the need for separate models for coding and language tasks, making it an all-in-one AI powerhouse.

----
Timestamps:

00:00 - Introduction
00:07 - About DeepSeek-V2.5
02:06 - Testing
06:39 - Conclusion
07:43 - Ending
Рекомендации по теме
Комментарии
Автор

I haven't seen a single model get the hex problem right. That was the best butterfly so far, of the models I've seen you test, that is. Thanks for all the reviews and keep up the good work.

jimlynch
Автор

I had bought API Uage since it was the cheapest and best option for me.

Its programming capabilities are really good.

Good to know it was upgraded.

SudeeptoDutta
Автор

if it can draw a butterfly in svg, it could do it also in blender as 3d object. also it can compose music via midi notes :>

mjkht
Автор

My go to model for aider. Best value for money by far, even supports caching which is insane because the price is already so low.

lydedreamoz
Автор

Deepseek is cheap, and it works extremely well in aider, large context and large outputs,
aider --deep to me is the best option other than paid ones, and reasoning quality is very good. sometimes better than sonnet ... i never see that rate limit stuff...and the web 300 500 lines of context it takes no problem, try and paste and see ..it can generate whole ideas long answers ...very rarely cuts the outputs, or prints half file and stops ...

saabirmohamed
Автор

I‘m confused. I cannot understand your relatively positive rating for a model that size and so many fails. could you please compare the results of the current model to the former models? It all feels so random. Btw: Single zero shot does not say anything about the consistency.

MeinDeutschkurs
Автор

Deepseek to me is the best but since they are from China, EU, and US companies don't like mentioning them or using their models. Once they release a multimodal model it's going to have an impact. Deepseek is made for coding and I think that's the best approach. Instead of making a big model that's good at everything you just focus on making your model good at one thing and that's what they're doing with Deepseek It's a coding model first. Yes I sound like a fanboy but I think you should also improve your prompt because the more detail you give the better the result. How about not doing only zero-shot benchmarks? They also mentioned in the tweet at 0:55 that you should update the system prompt and temperature and this is something that nobody does when testing models. A +200B model shouldn't fail at the first 2 questions in your benchmark. It's like using a text-to-image model like midjourney, stable diffusion without changing the sref or seed values. You should adapt the system prompt and temperature based on the question type like question 11 about generating the SVG code for a butterfly, it sounds like a coding question but it's more of an artistic question first that needs a different system prompt. I saw a huge improvement with my local models when using Claude 3.5 Sonnet system prompt.

MacSn
Автор

Are the results of this test also available for Claude 3.5 Sonnet and GPT-4o? Thanks.

pudochu
Автор

Why don't u use coder 2.5 for coding questions

yuyutsurao
Автор

If we write in Chinese, do you think that could pass the language task

sinapxiagency
Автор

I find DeepSeek response time to be the slowest among LLMs but like it's cheap price

TomaszStochmal
Автор

can deepseek v2.5 be used with vscode for free through Inference with Huggingface's Transformers?

mrfresshness
Автор

I actually have four boxes of pencils by the way

johngoad
Автор

chat with deepseek work well but autocomplete very slow, don't know why.

minhhieple
Автор

I love how cheap Deepseek is, and it's very impressive for open source, but man is it slow.

TheBuzzati
Автор

it doesn't seems to work in Claude Dev, Can you make a tutorial for that ? Thanks

mallardlane
Автор

7:42 so deepseek v2coder beter than deepseek v2.5?

Tomosw-xx