AI can't cross this line and we don't know why.

preview_player
Показать описание


Special thanks to Patrons: Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin

REFERENCES

Some papers that appear to pass the compute efficient frontier

Leaked GPT-4 training info
Рекомендации по теме
Комментарии
Автор

"What is the minimum theoretical dimensionality of natural language?"
... "42" o_o

HeavyMetalMouse
Автор

Every time I hear stuff like this I have to remind myself that the human brain fits all of it's memory and processing power into a head sized container and only uses a few tens of Watts of energy to maintain operation continuously. Clearly the issue is with the method used. Our computers are already much faster and have more gates than a brain has neurons. And even simple creatures with tiny "bird brains" can do amazing things.

dougcox
Автор

I’m a physicist and I was like “so… it’s a gas”.

Statistical mechanics is more powerful than what people thinks

Spectacurl
Автор

These are Language Models, and it's well known that natural languages follow Zipf's Law, where word frequencies adhere to a power-law distribution. Because LLMs are trained to learn and predict patterns in language, it’s clear that they must also exhibit this behavior. In fact, this could explain why LLMs seem to hit an efficiency ceiling—they are constrained by the power-law nature of language itself. As the models improve, their gains become increasingly marginal, particularly when dealing with rare words and complex language structures.

AJGLenio
Автор

Two 20 Watt human brains looking at a 20 million Watt supercomputer operating for 3 months costing $200 million.

“Look at what they need to mimic a fraction of our power”

JeffNeelzebub
Автор

"the intrinsic dimension of natural language is 42"

we all knew it

metadaat
Автор

22:25 I remember reading at least one compelling paper that argued that emergent functions like this is more a property of the way we measure the model's functions, rather than a step change in the ability of the model. You might want to look into that

jcorey
Автор

Another fascinating point is how well our observations in neural biology follow similar power scaling laws. The human brain seems to fit very nicely on the primate scaling curve and (not surprisingly) points to an adaptation within primates for superior cognitive scaling performance vs other mammals. There are obviously important distinctions between ML and our brains. Models like GPT-4 are highly specialized and would be a better comparison to the sub-network of regions in our brains that processes language. Lastly, an area where we are significantly lagging in capability, is the Abstraction and Reasoning Corpus (ARC). Human scores against ARC are in the 80% range whereas our best algorithms are in the range of 30% and of course all of the most interesting applications of AI/ML will heavily task abstraction and reasoning tasks. We have LOTS of work left to do so please don't fall into the trap of thinking we just need to throw more GPU's at this and we somehow get to the singularity... we are still missing very important stuff but the progress we have achieved is also incredibly impressive.

John-zzfz
Автор

Possibly shouldn't be a surprising relationship:
Thermodynamic entropy and entropy in information theory are related and it tells us that each bit of information has an minimum cost in terms of energy.

When you plot cross entropy, you're plotting missing information. It would make sense to flip the y axis and consider that to be how much information was learned.

When you plot compute, you're also plotting energy, which is directly proportional to the information and therefore should produce a straight line.

Not all models/learning schemes are 100% efficient so they are constrained to one side of that line. The other side represents a thermodynamic impossibility. It would break the 2nd law because the entropy of the universe (increased by the heat output of your GPUs, decreased by your model learning) would decrease.

jameshogge
Автор

Einstein's first name is not John.
Einstein's first name is approximately Alpert.
Einstein's first name is probably not Alphonso.
Einstein's first name is derived from the Germanic Adalbert.
Einstein's first name is Eduard. (Albert's second son.)

I don't think there's a single sentence segment in any human language that you can come up with that has only one correct solution for the next word.

Darxide
Автор

The quality of your videos is worth the wait

PrajwalDSouza
Автор

"Einstein's first name is …”
Einstein's first name is universally known.
Einstein's first name is known by most people.
Einstein's first name is not an example of name weirdness.
Many possible next words....

wafikiri_
Автор

Of course the answer is 42. Always was!

drhxa
Автор

So, FWIW, it is well known in science that log-log plots nearly always end up looking linear. It is a feature of log-log plots and/or the unlikeliness that any system is super exponential.

ianollmann
Автор

I am blown away by both the implication of those papers and also (and especially) by your ability to convey so much information in a 24 minute video that makes it understandable to amateurs in this field like me.

kurtmueller
Автор

Its not the AI model, its a property of the dataset - that's the only commonality. The fact that it follows a power law is a significant indicator. Most statistical linguistic experts will be able to point to many such power laws that appear when we measure human languages. The most commonly known is word-frequency power law, so well know that it has a name, Zipf's Law. Regardless of language, regardless of what collection of works, the top 100 words comprise approximately of half the collection and the next top 10000 words comprise the remaining half. Power laws appear in a lot of AI datasets because most complex data exhibits these power law properties, and folks generally only apply AI on complex problems.

intrinsical
Автор

42? But what is the question? ... ohhh!

brantwedel
Автор

My brain predicts one word after watching this - excellent.

Craznar
Автор

21:33 I’m not a fan of numerology, but it is funny - how the dimension of natural language happens to be 42 (just like the "Answer to the Ultimate Question" number from "The Hitchhiker’s Guide to the Galaxy"). :)

spomytkin
Автор

When it crosses the line AI will learn to say "I don't know" and stop hallucinating. The 'I dont know' factor being absent from a vector being driven through higher dimensional space mathematically seems like a hard limit without some sort of mock self awareness strapped to that process.

dgn