GPT-4 Details 'UNOFFICIAL' Leaked!

preview_player
Показать описание
Relevant Links:

❤️ If you want to support the channel ❤️
Support here:
Рекомендации по теме
Комментарии
Автор

Awesome! I wanted to see a more in depth comprehension of this! Lyk my thoughts once I’m done watching!

Nick_With_A_Stick
Автор

Look out for the open source models coming out soon 😏

SloanMosley
Автор

MOE is exactly what i thought while designing a certain Role prompt .and to be able to talk to different experts and have discussion with them

MODEST
Автор

Fantastic video, thank you for creating and sharing. Very interesting to see this is potentially how GPT-4 is built.

Serifinity
Автор

This was my theory. I.e.There was more than one model. Micrsoft does something similar with their form recognizers. I also think this is how AGI will be attained. Smaller expert models put together

jeffsteyn
Автор

That makes a lot of sense. I’ve seen that training models on one type of task can reduce its effectiveness in other areas. So why try and balance everything at the cost of effectiveness.

SloanMosley
Автор

With this Leaked shall the Open -source community come with new things. Thnx for your update.

jayhu
Автор

GPT-4 is trained on 13T tokens… (the statistics for my language: French) about 67 000 books with an average length of 400 pages are published each year (26, 8M pages/year). 13k tokens for 20 pages approximately. That means each year 17, 42B tokens are published in French. That means if you were only using books to train a French GPT-4 you would need you’d need approximately 750 years of all book publications at current rates… 🤯

pn
Автор

Man, this happens over and over. I read about stuff, hear about its limitations and I think to myself, I could do *this* to solve it. Then, a few months later I see news “researches use your idea to improve XYZ”.

I’ve been thinking to myself that you’ve got to isolate parts of your model so that you only have to calculate for the part that matters for a particular inference run. There’s got to be so much wasted computation.

Yup, they thought of it too.

DeruwynArchmage
Автор

Interesting video unravelling the inner workings of gpt4. Thanks

henkhbit
Автор

Great insights. Amazing. Keep up the good work

arunamalla
Автор

George Hotz days on his interview with Lex Fridman that the architecture is 220B parameters, sixteen-way mixture model, with eight sets of weights. Watch his recent interview for this quote (I am recalling from memory; I might have his wording a little wrong). He claims to have been told this info by a reliable source (he doesn't specify the source).

briancase
Автор

Not sure how good ur understanding of neuroscience is but it would be good to anchor your explanations on LLMs based on first principles of neuroscience when discussing parameters and layers.

porroapp
Автор

George Hotz have more informations on GPT-4.

IronZk
Автор

So....falcon 180b is coming this summer. For those of us with servers at the crib/startups...will be relative to gpt-4. Cool beans. Falcon-180b going to be a parent model to a lot of micro-LM's.

Also mean we can use structural pruning and 2-bit quantization and get memory below 30GB(would require retaining on subset of original dataset they open-source. To regain loss perplexity from extreme model compression. Cost would be ~100k, doable for most startips).

An open-spurce gpt-4 distill parent model for all LM-based startups. This is the story that should be obsessed over. Acedmeic research has been far more compelling than anything openai released. I dont get the D riding. Who cares about MoE like that when we got LongNET now. 2024 we going to see a lot of AI generated model architectures. Im literally just waiting for falcon-180b so i can finetune the hell out of it for ML automation and experimentation. MoE is the least exciting to me... megabyte is though.. megabyte + SPAE(best method for multimodal LM and includes the ability for LM to use ICL for image/video generation...weird nobody discuss this..we seen what LM used for text guidance for image generation leads too{deepfloyd}, what will the quality and coherence look like w/gpt-4 scale LM providing the guidanc?? o my lol) + dilated attention + landmark tokens + KG's + sequencematch and native Tree-of-thought style sampling....o my 😂.

zandrrlife
Автор

I've been using MoE for a while now without knowing its name.

thedoctor
Автор

Dosen't that make the best LLM's to date not much more impressive considering that they are not a bunch of experts grouped up? Considering the recent papers like orca, textbooks is all you need, the different ways to increase the token limit, to decrease model size and decrease computational cost, this field is just going to keep evolving rapidly. Assuming that openai did not already implement all of this in gpt4, which I think they did not for at least the most part.

reinerheiner
Автор

11:45 Since Mojo is faster than Python, does this mean that the cost of training models will be lower using Mojo?

tsylpyfod
Автор

Does anyone know the ingredient of the cocacola

shivamkumar-qpjm
Автор

the tweet has been deleted by the user due to copyright.

feverdelrey