Language Models as World Models

preview_player
Показать описание
Jacob Andreas, MIT
Рекомендации по теме
Комментарии
Автор

its interesting the guessing games played by these researcheers !
We have a neural netwoork which is a collectio of regression tress as well as word matrixes : So it uses the regression to predict the next word given the matrixes for each position in the sequence :
SO we trained the same model to produce outpuits based on questions and fed it im=nputs and outputs and forced to to match the output given the input !
so we have amny of this too !
so we need to understand that the neural network did not changed and it can be used to move a robotic arm !
as what is a neural network !> it predicts based on past actions so it is picking the highest probable output given the input !
by training the model on multiple tasks .... who knew that it could maintain past tasks :
Such tat themodel used to preditnumbers based on handwriting, can also be trained to answer a question !
so we can findout that what is actualy fgoing on is Regression !
as we can map nearly any task to t regresion model which is the structure of the neural network : the transformer uses word matrixes as a state ! but to drive a car we would have a diffeent state and to generate a sound or image we owuld have a different state !
so to make the model very versitile its all about what state we can pass throught the modle and it will produce regression tress based on this state at various layers !
so we find that the layer coul=nt can help with ttransformationa dn the more comlexed the more layers arerrequired ! today we have found that with the transformer model as long as we can place the satte in TEXT format we can use this model for w=varrious types of predictive asks ... so what is the state inside the mdel now ?? is it word to word matrixes a? NO!
its tenirs and vectors !
so a mathmatically represented data can be regressed and predicted ! ....
right now we are only using tesors of massive width to represnt the massive state of the sequence but it could be smallerr !

So we find that the attentoon is all you need is a very important step in transofmrer allowing for rretaergeting of the expected output and keeping the modle from straying from the ctual expected outcome ! --- -these variious attention methods are the diversive factor in the network as at these lcation the state is what is attended too: it is rewoven innto the current layer or step !
this allows us to have many layers! and gradually changing the output as it passes through these layers : we find interestingly that we can take outputs qhich are vaild from various layers !
hence the attentionn layers are actualy doing more than regression !

xspydazx