Hella New AI Papers - Aug 9, 2024

preview_player
Показать описание
Read or listen to the newsletter with all the papers I chose to keep here:

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!

Discuss this stuff with other Tunadorks on Discord

All my other links

Timestamps:
00:00 Intro
53:23 Outro
Рекомендации по теме
Комментарии
Автор

first paper out of the gate sounds like a winner 🤯

GNARGNARHEAD
Автор

Damn, just downloaded like half that list. Love the curation you do.

kevon
Автор

Yo, me got addicted to your channel. I kinda binge-watched your latest vids. I just grab what I can and then guess the concepts of those I do not fully understand.

jakeaustria
Автор

Very nice review. That first paper got my attention!

Jayc
Автор

11:10 : oh, cool, this sounds similar to something I was daydreaming about (except I was imagining clusters of a handful of tokens not necessarily matching sentence boundaries, and I was imagining doing this recursively).
Like, I imagine this is like: have an auto-encoder that goes from a not-too-long sequence of tokens, to a single higher-level token, and then the decoder part predicts the individual tokens given the previous higher-level tokens, the current higher-level tokens, and the base-level tokens already produced corresponding to the current higher-level token?

I suppose their tokens encoding entire sentences can’t be using a fixed discrete set of tokens for the higher level tokens, so, I guess they just have those be continuous?

(Aside: hm, if you used a standard decoder-only LLM, but instead of selecting a token with the probabilities it assigns, just took the average of the embedding vectors for each of those tokens, and let that iterate a dozen times, and then switched to picking specific tokens again, I wonder what kind of garbage output that would produce?
That thought probably seems pretty unrelated. It came to mind because I was thinking about how, when the “tokens” produced as outputs, are continuous, you don’t get a probability distribution, so the only way to mix between options is to mix the actual options, rather than a probability mix of options.)

Another idea I had in relation to this, was that maybe the encoding for a cluster of tokens could have two parts, one which is only used when decoding to try to get the particular tokens back, and one which is used for that but also used when predicting the next higher-level token. The idea being that this might encourage it to separate the parts that matter significantly later in the text, with irrelevant accidents of phrasing. Perhaps somewhat of a semantics vs phrasing distinction… ..but probably not quite, because the phrasing at one part probably helps predict the phrasing at a later point, due to stuff like different writing styles, etc., so probably not a clean split.

drdca
Автор

The Apple intelligence paper isn’t too interesting, but have a look at section 5.1, something about adapting to the task at hand on the fly using LoRA. I don’t know about other literature related to this, but sounds pretty interesting to me

sniperhawk
Автор

really like the first paper shown. Its interesting how introducing self modeling has the consequence of also simplifying the network. I mean, it makes sense that the model would want to be simpler in order to optimally compute itself. I do wonder what effect the self modeling has besides that though: is the primary effect the simplificaton of the network when training? or does the auxillary task of predicting internal states assist with the primary task in a meaningful way? judging from the paper, it seems accuracy in the task actually drops slightly (although MNIST is such a simple classification example that I'm not sure that says anything about performance anyway). Really interested to hear more about this strategy in larger models.

TendoNin
Автор

Seems like we could use synthetic data for the blind vision model problem. They could use Unreal or Unity armed with a huge pile of game dev artist created models and shaders to set up millions of permutations complex scenes from different angles and also labels which we could piece together as we’re assembling the scene and train on that.
I have to assume Musk & Co are doing that sort of thing for their robot training.

andrewsilber
Автор

Skimming abstracts, I love it! Have some engagement.

tensiondriven
Автор

Do you ever take this information and rework it into multi dimensionally frameworks when you come across new information, . I watch your videos find the original source a interpret into my system ai frameworks in many different formats and sources of data. Was just wondering if anyone else does that?😊

superfliping
Автор

Great video, would be better without the awful haircut though

porpoisin
welcome to shbcf.ru