IDL Spring 2024: Lecture 13

preview_player
Показать описание
This is the thirteenth lecture of the 11785 Introduction to Deep Learning course at CMU in which we covered the following topics:

- Time-series data, where the sequence of past inputs carries information about the current inference,  require models that consider past values when making the current prediction
- Models that look into a finite past are simply 1D-CNNs
- To look into the infinite past we need recurrence
- The simplest recurrence considers past outputs along with current inputs
- - This ensures that an input affects the output for the indefinite future.
- - These are NARX networks
- Models can also hold the memory of the past internally (rather than through the output)
- - Older "partially recurrent" models stored them through intermediate "memory" variables
- - - Jordan networks use a memory neuron that retains a running average of outputs
- - - Elman networks clone the hidden layer of the network as a memory unit
- "Fully recurrent" networks are state-space models, with a recurrent state.
- - The "state" may be arbitrarily complex
- - These networks are called "Recurrent Neural Networks" (or "RNNs")
- An RNN can be "unrolled" over time into a chain of identical "columns" of computation
- - This essentially forms a very deep shared-parameter network
- To train an RNN we must compute the divergence between the sequence of outputs by the network and the desired sequence of outputs
- - This is not necessarily the sum of the divergences at individual time instants.
- - We will nevertheless need the derivative of the divergence w.r.t. the output of the network at each time instant
- Backpropagation starts at the final output and derivatives are propagated backward through time
- - At each time, loss derivatives for the output at that time are backpropagated and accumulated with derivatives coming backward from later in the sequence
- - Derivative rules for shared-parameter networks apply
- Recurrence can be extended to be bi-directional in cases where sequential (left-to-right) processing is not expected
- Bi-directional networks include "bi-directional blocks", which have two components (subnets), one of which analyzes the input left to right (forward net), and the other analyzes it right to the left (reverse net). 
- - The outputs of the two components are concatenated to produce the output of the block
- - During training, the appropriate components of the derivatives at the output of the bidirectional block are "sent" to the individual components
- - - Backpropagation is performed end to beginning for the forward net and from the beginning to the end for the reverse net
- - - The backpropagated derivatives at the inputs to the two subnets are added to form the backpropagated derivative at the input to the block
Рекомендации по теме