Hella New AI Papers - Aug 9, 2024

Показать описание

Read or listen to the newsletter with all the papers I chose to keep here:

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!

Discuss this stuff with other Tunadorks on Discord

All my other links

Timestamps:
00:00 Intro
53:23 Outro

Tunadorable

Рекомендации по теме

Комментарии

first paper out of the gate sounds like a winner 🤯

GNARGNARHEAD

Damn, just downloaded like half that list. Love the curation you do.

kevon

Yo, me got addicted to your channel. I kinda binge-watched your latest vids. I just grab what I can and then guess the concepts of those I do not fully understand.

jakeaustria

Very nice review. That first paper got my attention!

Jayc

11:10 : oh, cool, this sounds similar to something I was daydreaming about (except I was imagining clusters of a handful of tokens not necessarily matching sentence boundaries, and I was imagining doing this recursively).
Like, I imagine this is like: have an auto-encoder that goes from a not-too-long sequence of tokens, to a single higher-level token, and then the decoder part predicts the individual tokens given the previous higher-level tokens, the current higher-level tokens, and the base-level tokens already produced corresponding to the current higher-level token?

I suppose their tokens encoding entire sentences can’t be using a fixed discrete set of tokens for the higher level tokens, so, I guess they just have those be continuous?

(Aside: hm, if you used a standard decoder-only LLM, but instead of selecting a token with the probabilities it assigns, just took the average of the embedding vectors for each of those tokens, and let that iterate a dozen times, and then switched to picking specific tokens again, I wonder what kind of garbage output that would produce?
That thought probably seems pretty unrelated. It came to mind because I was thinking about how, when the “tokens” produced as outputs, are continuous, you don’t get a probability distribution, so the only way to mix between options is to mix the actual options, rather than a probability mix of options.)

Another idea I had in relation to this, was that maybe the encoding for a cluster of tokens could have two parts, one which is only used when decoding to try to get the particular tokens back, and one which is used for that but also used when predicting the next higher-level token. The idea being that this might encourage it to separate the parts that matter significantly later in the text, with irrelevant accidents of phrasing. Perhaps somewhat of a semantics vs phrasing distinction… ..but probably not quite, because the phrasing at one part probably helps predict the phrasing at a later point, due to stuff like different writing styles, etc., so probably not a clean split.

drdca

The Apple intelligence paper isn’t too interesting, but have a look at section 5.1, something about adapting to the task at hand on the fly using LoRA. I don’t know about other literature related to this, but sounds pretty interesting to me

sniperhawk

really like the first paper shown. Its interesting how introducing self modeling has the consequence of also simplifying the network. I mean, it makes sense that the model would want to be simpler in order to optimally compute itself. I do wonder what effect the self modeling has besides that though: is the primary effect the simplificaton of the network when training? or does the auxillary task of predicting internal states assist with the primary task in a meaningful way? judging from the paper, it seems accuracy in the task actually drops slightly (although MNIST is such a simple classification example that I'm not sure that says anything about performance anyway). Really interested to hear more about this strategy in larger models.

TendoNin

Seems like we could use synthetic data for the blind vision model problem. They could use Unreal or Unity armed with a huge pile of game dev artist created models and shaders to set up millions of permutations complex scenes from different angles and also labels which we could piece together as we’re assembling the scene and train on that.
I have to assume Musk & Co are doing that sort of thing for their robot training.

andrewsilber

Skimming abstracts, I love it! Have some engagement.

tensiondriven

Do you ever take this information and rework it into multi dimensionally frameworks when you come across new information, . I watch your videos find the original source a interpret into my system ai frameworks in many different formats and sources of data. Was just wondering if anyone else does that?😊

superfliping

Great video, would be better without the awful haircut though

porpoisin

Hella New AI Papers - Aug 9, 2024

Skimming hella new AI paper abstracts - January 2025

Hella New AI Papers - Nov 1, 2023

Hella New AI Papers - June 9, 2024

Hella New AI Papers First Week of 2024

Hella New AI Papers - Sept 1, 2024

Skimming through hella new AI papers - Sept 6, 2024

Hella New AI Paper Summaries March 24-29, 2024

Hella New AI Papers - Aug 9, 2024

Bulk Skimming Hella New AI Paper Abstracts - Aug 2, 2024

Hella New AI Papers This Week - June 29, 2024

Hella New AI Papers - Aug 24, 2024

Hella Brand New AI Papers - June 15, 2024

Hella New AI Papers This Week - June 21, 2024

Hella quick summaries of New AI Papers Published TODAY - Sept 20, 2023

Hella Brand New AI Papers - July 5, 2024

Hella New AI Papers This Week (Dec 17-22, 2023)

Skimming hella AI paper abstracts - Nov 5, 2024

Hella Brand New AI Paper Abstracts - Aug 18, 2024

Talking to my second brain (hella notes + AI) // Mike Moves #71

my skincare routine #shorts

She Gained weight

proof you’re not ugly😳 #confidence #shorts

Food Blogger STILL Won't Eat Food

CHICKEN OR BANANA!? 🐔🍌 #funny