Phi-1: A 'Textbook' Model

preview_player
Показать описание
After a conversation with one of the 'Textbooks Are All You Need' authors, I can now bring you insights from the new phi-1 tiny language model. See if you agree with me that it tells us so much more than how to do good coding, it affects AGI timelines by telling us whether data will be a bottleneck.

I cover 5 other papers, including WizardCoder, Data Constraints (how more epochs could be used), TinyStories, and more, to give context to the results and end with what I think timelines might be and how public messaging could be targeted.

With extracts from Sarah Constantin in Asterisk and Carl Shulman on Dwarkesh Patel, Andrej Karpathy and Jack Clark (co-founder of Anthropic), as well as the Textbooks and TinyStories co-author himself, Ronen Eldan, I hope you get something from this one. And yes, the title of the paper isn't the best.

Рекомендации по теме
Комментарии
Автор

God damn it dude.... again you completely change my understanding of how far along we are

Theonlyrealcornpop
Автор

Speaking of biological risks, Kurzgesagt just posted a video about that and I'm glad you brought up that paper from Oxford. You have my thanks!

seanmurphy
Автор

Awesome. People started asking me last year about scale and parameters and I predicted this but HOLY MACKEREL one billion parameters is crazy.

DaveShap
Автор

The "a grammatical error makes the model perform worse" part makes me think... Instead of training a model from scratch, shouldn't we be training a "base" model that already knows English? Exactly as with humans, you wouldn't send a 3-year-old to blindly read Python textbooks, you would make it learn English first

hidroman
Автор

You really add a lot of value in understanding what is happening in the field by bringing in together all these different papers or interviews in a coherent narrative, bravo, excellent work.

Diabloto
Автор

A new video from you is always the highlight of the day. =) Your hard work in keeping up with the news and drawing all the connections is much appreciated. Considering growing investments in compute recently: I'm extremely excited for what the 80, 000 H100 cluster someone is building according to rumors (cf. Emad) will bring about! And considering Nvidia's output of 200, 000 units per quarter there's still so much headroom for the near-term future.

autingo
Автор

I absolutely LOVE seeing that you uploaded a video, seeing that it was only 4 minutes ago is just a cherry on top!

waterbot
Автор

I think a Mixture of Experts (MoE) approach could be the key towards AGI instead of monolithic giant models. According to some experts GPT4 is a MoE of 8 sub-models. We do know the brain is also distributed, we have 150000 cortical columns that specialize for overlapping but non redundant areas of expertise and then communicate to reach consensus (see a thousend brains by Jeff Hawkins). In that regard this paper is great news. Also it would be much easier to implement online learning (continuously learning models) with smaller models, and you could focus your training hardware periodically on different sub-models. Combine that with a tree of thoughts framework (which was still using only one generalist model), self-reflection and self-consistency and apply it in a loop (similar to Auto-GPT) with the different sub-modules discussing things out from their respective area of expertise plus their ability to permanently learn (potentially from each other if their is overlapping domain relevance of information, which is similar to synthetic data ) this may bring us a lot closer to AGI

ct
Автор

I am big fan of the approach of using lots of specialized small models running in parallel with a high level model to orchestrate them. I think it could be more performant, energy efficient and easier to certify safety for each of the "domain specialists". I guess the downside might be not getting cross-cutting emergent properties pop up as often, but that remains to be seen. A hierarchical approach to composition also just seems plain satisfying. Anyway, really cool model and super duper video. Looking forward to being able to boot some of these on my phone. Thanks as always for such high quality content 🥳🔥

jumpstar
Автор

Personalization imo is the future. Small models working on personal data to collectively to achieve a final goal.

adarshas
Автор

Great video. I've been thinking all along, it's not about having the biggest and brightest AI, it's the capability to produce lots of task-specific or job-specific AIs that can run on smaller devices that will truly introduce this tech into the mainstream.

johnblack
Автор

It's always struck me how important the quality of the data needed to really be to have these models work more efficiently, but turning it into synthetic data that is then curated is an amazing idea IMHO.
Thanks again for keeping us all in the loop as you continue to consume unbelievable amounts of information. Hmmm, are you an AI @AI Explained? ;)

mikemcaulay
Автор

The idea of highly focused models always brings me back to the "council of AIs" idea. Having something like ChatGPT or even less capable AIs orchestrating which of the specialized models should be responding and even doing some of the advanced question forming and determining best of 5 results for example could be profoundly more capable than anything out there right now. Amazing time to be alive and be involved in this historic step up.
Of course that immediately makes me think of safety. I wonder how well we could train one of these focused models on security issues alone and it acts as the gateway in and out of these systems to essentially be a smart filter. Seems preferable to polluting the primary AIs with security measures just in terms of quality of output.

mikemcaulay
Автор

Always excited when you upload a new video! I have been using bing chat for most my "AI needs" (read: lack of programming skills but great interest) but if there will be models soon that can be run on much cheaper hardware but be better at a specific topic that would really be a step in the right direction i think. So damn excited for this technology.

Birne_TM
Автор

I find this incredibly fascinating. With GPT-3 or GPT-4, we can focus the training data on more useful examples. Furthermore, we could potentially train the model to learn about ethics and identify potentially dangerous knowledge, like bioweapons, to safeguard future model training. These models could play a vital role in aligning and improving future models!

michaelmccoubrey
Автор

Insightful, your analysis of the Phi-1 model is, hmmmm. Small in size it may be, yet perform at high levels it does. Synthetic data and the future of language models, fascinating it is to ponder. Await with eagerness, we do, to see how the future of AI, these developments will shape. Continue your excellent work, you must, hmmm...

DrHanes
Автор

As always, very informative. Thank you. This is in line with my business strategy. Rather than providing customers with a lengthy manual, we offer them a highly specialized LLM trained in the use of the product or software.

toddnedd
Автор

I love the list of sources in the info panel - thank you for providing it! 🤩

Doug
Автор

I realized early on with the Open Assistant project that bad data would end up mitigating its capabilities in small unnecessary ways during the training process. Clearly over time we'll optimize each parameter as part of the design to produce the most capable models. That will include only using curated optimum data, sizing targets for various destination platforms (phone vs. gaming PC vs data center), focused data for specialization by the model to a desired area of expertise and optimum numbers of cycling through the training data among others.

I would think that optimizing everything we've already learned to build out the best models possible leveraging all this understanding, and more opportunities for improvement likely to be realized along the path to accomplishing that may lead us to somewhere around 2-3x the capabilities per given size, which will be amazing.

I agree with Dave Shapiro that Agency and Dependency are the important yardsticks going forward, and it makes sense we can maximize our progress near term by perfecting as above on a number of specialized models able to work together.

brianmi
Автор

something that is maybe underrated/understated here : AI Explained puts ALL of the relevant links in the description, absolutely fukn legit dude =] ; stay awesome

CYIERPUNK