ORCA 2 | Microsoft's BREAKTHROUGH in Open Source LLMs - AI training the next gen AI 🔥

preview_player
Показать описание
Get on my daily AI newsletter 🔥
[News, Research and Tutorials on AI]

See more at:

My AI Playlist:
Рекомендации по теме
Комментарии
Автор

The problem is the AGI will most likely be created in a country using the craziest date notation (MM/DD/YY).

MateuszZwierzycki
Автор

The fact that you have to put the date of recording as an overlay shows how crazy things are.

MikeyDavis
Автор

Excellent work. Nobody goes through the white paper like that, and it's really more meaningful to see it done like this! Thank you!

TheDisillusionist
Автор

I'm a trooper! Thanks Wes. You're my autonomous agent for AI research rn, keep up the great work! What a time to be alive eh?

tonytravels
Автор

One big benefit of this training style method is that it has the prospect of being able to better run an LLM on a local computer because of its increased efficiency.

ethanlewis
Автор

I almost forgot other a.i. companies existed since the open a.i. debacle. Thanks for keeping your eye on the ball.

middle-agedmacdonald
Автор

It is a testament to how far along we are on the exponential curve to singularity, when I see a video on the subject of AI that is two weeks old and consider it to be obsolete info.

panpiper
Автор

🎯 Key Takeaways for quick navigation:

00:00 🔍 *Microsoft's Orca 2 builds on Orca 1, emphasizing data quality over sheer model size for effective open source models.*
01:50 📊 *Orca 2 highlights the importance of data quality for AI models and explores the potential of synthetic data generated by AI models for training.*
03:16 🌐 *Orca 2 suggests AI progress isn't just about larger models but also about making models more task-specific, transferring knowledge, and improving through AI-generated data.*
06:16 🐋 *Orca 1 learned from rich signals, while Orca 2 focuses on enhanced training signals for smaller language models (LMs) to improve reasoning abilities.*
08:46 🧠 *Orca 2 aims to teach smaller models various reasoning techniques and determine the most effective strategy for different tasks, outperforming larger models on benchmarks.*
09:30 🚀 *Orca 2's smaller size achieves performance levels similar to models 5-10 times larger, excelling in complex reasoning tasks in zero-shot settings.*
11:05 🧠 *Orca 2 demonstrates reasoning abilities by answering theory of mind questions, although some model versions display flawed reasoning in certain scenarios.*
13:27 💡 *AI models like Orca 2 showcase reasoning through text completion, showcasing improvements in multi-step problem-solving and tackling previously challenging tasks.*
16:15 🔄 *Orca 2 employs techniques like prompt erasure, training smaller models by learning from larger ones' behaviors without directly exposing them to the original prompts.*
19:04 🧭 *Orca 2's cautious reasoning technique selects nuanced behaviors from larger models, enabling smaller models to strategically approach tasks rather than merely imitating larger models.*
20:16 🧠 *Instruction-tuned models are limited by pre-trained knowledge, highlighting the importance of smaller language models as reasoning engines rather than repositories of pre-existing knowledge.*
20:45 🤯 *AI models function as reasoning engines, offering broader reasoning capabilities beyond specific tasks, enabling software to reason about various contexts and situations.*
21:55 🤖 *Explanation tuning emphasizes that stylistically correct outputs may still be incorrect, stressing the significance of nuanced instructions for accurate reasoning in AI models.*
22:09 📚 *"Cautious Reasoner" emerges as a new term in AI, showcasing the variance in AI model responses based on system instructions provided, influencing their reasoning and problem-solving abilities.*
23:21 🧩 *GPT-4's responses are influenced by the given instructions; tailored prompts produce better results, indicating the significance of specific instructions for better AI reasoning.*
24:32 💡 *Training smaller language models like Orca 2 on tailored synthetic data significantly enhances their reasoning capabilities, achieving performance levels comparable to larger models, especially in zero-shot reasoning tasks.*
25:13 🌐 *Synthetic data allows for the creation of smaller, specialized, and accessible open-source models, enabling diverse applications and reducing reliance on larger models for various tasks.*

Made with HARPA AI

MarkyR-AIBeats
Автор

I am really gratefull for your videos. I sometimes wonder why there are so few in-depth content creators. Makes your videos even more valuable. Thank you

ChronozOdP
Автор

I would guess that not only quality of data, but the ORDER in which it is used to train a model matters a lot. Sort of like how one needs to learn basics of math before learning more advanced topics. Or how if you know some things, other things can be learned more efficiently via analogy to the earlier information.

tomcraver
Автор

about time a content creator puts a date stamp on the video

atypocrat
Автор

I wonder if stronger models could be used to generate bespoke tokenizers for smaller models. BPE seems like a very general solution to go for, but if we’re bringing LLMs into neural network architecture design they might guide us to better tokenizers for smaller models. I liked that the Bloomberg GPT (which they keep internally only, and they used that name even though OpenAI wasn’t involved) had tokens that could represent multiple words. I imagine there could be an encoder making the first pass, handling polysemy and translating into the richer embeddings designed by the larger LM.

mshonle
Автор

I like the picuture with Orcha playing the guitar

iwsutw
Автор

When people talk about self-improving AI, and the “intelligence explosion” that could happen as a result, it’s almost always in reference AI’s modifying their own code (for example, their own architecture, weights, or code used to train them). But I rarely hear anyone talking about data as the way that this could happen. Between the Orca paper and recent leaks from OpenAI, it sounds like they’ve determined that text generated by the current generation of LLMs can be used to train the next generation of even smarter LLMs. I see no reason that process couldn’t continue recursively, such that the models could continue getting smarter simply by training on better and better data. It’s a very interesting thought, this sort of “bootstrapping” superintelligence.

therainman
Автор

I think the problem with the ball vs basket question is that they probably assume just because the second person was already in the room when the ball was moved, they may not have been looking, or noticing and that's what it means by they didn't see it.

rezakn
Автор

Thank you for your succinct summary of this paper and its key points as well as your thoughts on how it impacts AI technology and applications.. Now to find the paper and read it in detail. BTW, YOU, Wes, are the TROOPER!

fredcox
Автор

I’m excited to learn about open source. Thanks Wes. 👏💥

CM-zljw
Автор

Neural-net like swarms are a long used approach in "reasoning" machines. This seems quite useful to swing back to. The base thought of distributed computing or programming (back in the 70s) can, Can, allow for both more efficient growth and (hopefully) less dominance by a single, all-encompassing system

juliapardieutroyer
Автор

Bot swarms and orchestration are going to be huge in 2024. Foundation models like Llama, Claude and GPT-4 will be the backbone of narrow bots that have specific functions.

GENthetik
Автор

With regards to Mark and John you did a very human thing. Your eyes skipped over some of the second sentence, speed reading. (I got this right first shot thanks to some dyslexia in my world. I read slowly.) The beginning of the second sentence says, "While John is away. Mark puts...." So Mark's change for the ball position was not seen by John when it happened. If one presumes a sequence of John and Mark are in the room. John moves the ball to box. John leaves the room. Mark (mischievously?) moves the ball to the basket. If either John stays away or the ball cannot be easily seen at a glance to be in the basket the answer derived is spot on. So for a proper answer there is missing data with "assumptions" to fill it in. Orca-2-13B made the same "assumption" I did that seems to be intended by the structure of the question.

I have a strong suspicion that on our race to synthetic beings that are smarter than human we may find they are more like some Autistic people than neurotypical people - very literalistic.

{^_-}

Wizardess