So... What Is The Right Way To Train AGI?

preview_player
Показать описание

Check out my newsletter:

Emergent properties with repeated examples

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Intelligence at the Edge of Chaos

Game of Life

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, Ben Shaener, Chris LeDoux, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Marcelo Ferreira, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz, jiye, Anushka

[Music] massobeats - lush
[Video Editor] Silas
Рекомендации по теме
Комментарии
Автор


(I reuploaded this video cuz there was a pretty big mistake at 6:13, sorry notifications!)

bycloudAI
Автор

such a beautiful knowledge, i will not use it anywhere and not talk about it with anybody.

weirdo
Автор

I fell asleep while listening to this video and dropped my phone on my wife's head and now she's mad.

Ikbeneengeit
Автор

These videos are so nice for someone like me with no technical background in machine learning. Thank you and please keep making more!

heysth
Автор

Bro cover “Fourier Heads” or belief state transformer. Fourier head research is interesting, I see a lot of value of integrating Gaussian mixture model principles into LM to better handle complex distributions.

To be honest. One of my core principles is disentanglement, there’s a reason why we don’t see expected performance gains with multimodal data and reasoning in general, the model treats it as a single continuous sequence, the solution I’ve been working is multivariate next-token prediction, where each modality is considered, and yes everything can be treated as distinct modality, even reasoning via structured reasoning tokens, instead of for T = sequence length, it would be N x T, where N is the modality count, almost like a time series problem, obviously increases sequence memory for the sequence, I’ve seen clear benefits and think it’s the future. Why I don’t expect legit breakthroughs from any of the top players. No new ideas. Rather no divergent ideas. AGI will be created by divergent thinkers. Someone already releases an entropix I believe it’s called, which recreates o1-preview style outputs lol, just needs DPO to really get that juice out. We need to fund our divergent thinkers.

zandrrlife
Автор

This feels somehow similar how physics is based on a seemingly simple set of rules, yet creates impossibly complex situations/states.
There must be a limited set of core rules a base model needs to learn to become an effective reasoner.

OperationDarkside
Автор

very carefully, and with compassion and wisdom

CYIERPUNK
Автор

Multi-step prediction has been known for a while to perform poorly. It's best to either predict probabilities and sample or predict a single timestep, recursing for more. LLMs are doing both.

poipoi
Автор

2:31 Lack of words like "skibidi"

DeniSaputta
Автор

You did not mention that rule 110 is Turing complete. It may be not because of the edge of chaos, but because of the Turing completeness.
All Turing complete systems behave generally similar to what they define as edge of chaos. Although you can construct some which hides this under apparent noise.

adamrak
Автор

It seems almost obvious that just chasing complexity horizons will lead to increasingly complex output potentials also, but to see how this can be done in practice, and related back to OG cell automata is very cool.

tisfu
Автор

The "long tail" really explains why AI slop is so mid - it is literally the middle of the distribution of language. And you can see it in most models, even if different wording is used.

..
Автор

What you mentioned reminds me of curriculum learning from RL. Start off training easy then gradually make it harder.

cagedgandalf
Автор

Great video! I first heard about the brain being on the edge of chaos by Artem Kirsanov, who has a great channel (of the same name) on computational neuroscience. I'm thinking, those models that are trying to predict 5 steps at once might be ultimately better, but they would require much longer training (and maybe size), and therefore computational resources, to start learning some complex patterns. It could probably be tested with models that try predicting 2 steps ahead...

ConnoisseurOfExistence
Автор

very cool and interesting. I had some similar intuition and I'm glad you discussed this paper. Great work.

ryanengel
Автор

Stephen wolfram a New Kind of Science and itscomputational theory applied to training models is the way to go, me thinks

MathewRenfro
Автор

Great video as always, edge of chaos where understanding ends)

Lexxxco
Автор

This is basically just information theory…

redthunder
Автор

Essentially, because the model learns to incorporate past states in its decision making, it becomes capable of better reasoning. AKA, this is just another case where transfer learning is truly an important key. Transfer learning aka generalization is also the reason why sample efficiency improves with training.

GodbornNoven
Автор

why do i perfectly understand some of your videos, and at the same time get absolutely confused by others?? 😭😭

ginqus