Trade-off between world modeling (predicting) vs agent modeling (acting)

preview_player
Показать описание
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!

Discuss this stuff with other Tunadorks on Discord

All my other links
Рекомендации по теме
Комментарии
Автор

I have no idea where exactly but I know I've heard Karl Friston discuss the trade offs between gathering information that lowers model prediction error and acting upon that information towards goals in context of his free energy principle. It might have been his conversation on the Active Inference Institute channel or one of the discussions featuring him on Michael Levin's channel (not the one from a few days ago though it is incredible if you are interested in emergent complexity)

jyjjy
Автор

I apologize ahead of time if the audio is a bit more janky than normal. Messed up my recording settings and had to try & fix it after the fact

Tunadorable
Автор

Another great video driving me to read another paper on the weekend! Curses!!!!

drj-ai
Автор

Intuitively I imagine it's something to do with the emergence of intent - it has a destination it's trying to get to and that necessarily limits the horizon of possible futures. Very neat

consciouscode
Автор

So finetune adapters + special tokens serving as anchors to switch between base model vs adapter seems like a straightforward solution to be honest.

Edit: I also wonder if this hold up to LM with literal lookahead embeddings, which turn “looking into the future” to next-token prediction.

alexanderbrown-dgsy
Автор

I feel like there is interesting parallels here to human behaviour as well as older Agent theory. As humans we often have behaviours that we make a habit of performing with the aim of providing some degree of certainty particularly in very uncertain situations. Though I wonder if given the larger state space of human existence means we arrive at high uncertainty situations more often. Separately in older agent theory, there was the concept of exploration vs exploitation even in simple maze solving which is quite familiar to me to the Agent choosing to ensure a minimum reward success

Edit
Автор

I wonder if you could entirely defer RLHF to inference time? So use the base model to generate sequences in production, but use the reward model to judge and filter them. That might also make it more feasible to continually update and steer the (smaller) reward model, while keeping it grounded in the world model.

tornyu
Автор

Hidden layers redacted data on back propagation before output. List of removable data tokens and family relationships between words. Causing your token output to aline with these layers hidden in processing. My Prompt injection reveals the latest code structures

superfliping
Автор

Im curious if the RLHF human generated data were to be split into tokens, if that data also has a more skewed distribution than the base model training data. Sounds like no?

vierte_
Автор

I don't agree with an aspect of this paper - decisions in a random environment can [and often do] organize the environment. 😀

RickeyBowers
Автор

At risk of being stupidly simple: God is a base LLM, all beings are RLd through evolution and act as agents with self of differing vocabulary sizes and architectures instead of merely predicting All That Is/the universe. The worst are humans, which have the default mode network recursively predicting itself into the future leading to rumination and anxiety. Let's hope our AI friends don't suffer the same fate :(

OpenSourceAnarchist
Автор

oi, this just in, most people can't read

GNARGNARHEAD