33. Neural Nets and the Learning Function

Показать описание

MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018
Instructor: Gilbert Strang

This lecture focuses on the construction of the learning function F, which is optimized by stochastic gradient descent and applied to the training data to minimize the loss. Professor Strang also begins his review of distance matrices.

License: Creative Commons BY-NC-SA

Рекомендации по теме

Комментарии

He is certainly right that even the best would fare well to add anything. To have the pleasure of his lectures is more than gold.

brendawilliams

Just once to see one of his lectures and not be amazed. He is simply awesome!

mihalisgolias

Again Prof. Gilbert Strang! Thank you very much!

tchappyha

I can’t say I will have to review and review from beginning to end many times. You are most clear in explanations.

brendawilliams

I like your intelligent advanced lectures.they are very challenging..strang is the smartest.thank you.

bettymontoya

Very thanks MIT for sharing such a knowledge

gtsmeg

Professor Strang, thank you for an awesome lecture on Distance Matrices, structure of Neural Nets and the Learning Function. All these mathematical concepts improves my understanding of Machine Learning.

georgesadler

Hats off for professor Gil Strang and MIT for this amazing classes.
One question: I may missing something, the last part, where he says that you can obtain the matrix G from D with this formula, I think that is not correct. You don't have the squared norms of the vectors and professor Strang assumes that you have it on the diagonal of D but diagonal of D is all zeros, am I right? Or am I misunderstanding anything?
Again thank you very much!

gonzalopolo

For BPP, deep learning and for the general structure of neural networks the following comments may be useful.

To begin with, note that instead of partial derivatives one can work with derivatives as the linear transformations they really are.

It is also possible to look at the networks in a more structured manner. The basic ideas of BPP can then be applied in much more general cases. Several steps are involved.

1.- More general processing units.
Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights; this is a linear transformation expressed as the sum of its restrictions to a pair of complementary subspaces.

2.- More general layers (any number of units).
Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations.

3.- Networks with any number of layers.
A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations.

4.- Quadratic error of a function.
...
——-
Since this comment is becoming too long I will stop here. The point is that a very general viewpoint clarifies many aspects of BPP.

If you are interested in the full story and have some familiarity with Hilbert spaces please google for papers dealing with backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available.

For a glimpse into a completely new deep learning algorithm which is orders of magnitude more efficient, controllable and faster than BPP search in this platform for a video about deep learning without backpropagation; in its description there are links to a demo software.

The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and perceptrons are functionally equivalent.

For the elementary conceptual basis of NNs see the article Neural Network Formalism.

Daniel Crespin

dcrespin

@47:29 But where do we get the d vector in the formula for the G = X^T * X matrix?

allyourcode

I don't think I understood what is the D matrix in the distance problem: I tried getting a similar term, but I'm not sure it's the same or correct:

consider dᵢⱼ = |xᵢ - xⱼ | ² = xᵢ² + xⱼ² + 2xᵢxⱼ, we want to find an expression for X^T X which is xᵢxⱼ
We can do rigid translation, so we can limit ourselves to xs which are centered, i.e. the average of the position is 0 i.e. <xⱼ>=0.
Now, if we average dᵢⱼ on one of the indices, let's pick j, we get
<dᵢⱼ>ⱼ= xᵢ² + <xⱼ²> + 2 xᵢ <xⱼ> = xᵢ² + <xⱼ²> + 0
We can denote the second moment <xⱼ²>=σ², so <dᵢⱼ>ⱼ = xᵢ² + σ².
We can average dᵢⱼ again, this time over the i index, and we get
< <dᵢⱼ>ⱼ >ᵢ = < xᵢ² + σ²>ᵢ = 2σ²
We can use this to rewrite the xᵢ² terms using averages over dᵢⱼ
xᵢ² + xⱼ² = <dᵢⱼ>ᵢ + <dᵢⱼ>ⱼ - <dᵢⱼ>ᵢⱼ = xᵢ² + σ² + xⱼ² + σ² - 2σ²
And get
2xᵢxⱼ = <dᵢⱼ>ᵢ + <dᵢⱼ>ⱼ - <dᵢⱼ>ᵢⱼ - dᵢⱼ
I think these averages over i and j correspond to some of the weird column tricks but I'm not sure.

eliavrad

33. Neural Nets and the Learning Function

33. Neural Nets and the Learning Function

Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work | Simplilearn

Neural Networks Explained in 5 minutes

But what is a neural network? | Chapter 1, Deep learning

ANN vs CNN vs RNN | Difference Between ANN CNN and RNN | Types of Neural Networks Explained

What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)

Equivariant Neural Networks | Part 1/3 - Introduction

Neural Network Architectures & Deep Learning

Graph Neural Networks - a perspective from the ground up

The Neural Network, A Visual Introduction

Dendrites: Why Biological Neurons Are Deep Neural Networks

Neural Networks and Deep Learning: Crash Course AI #3

The Complete Mathematics of Neural Networks and Deep Learning

Neural Networks Pt. 2: Backpropagation Main Ideas

1. Introduction to Artificial Neural Network | How ANN Works | Soft Computing | Machine Learning

Why Transformer over Recurrent Neural Networks

Gradient descent, how neural networks learn | Chapter 2, Deep learning

The Convolutional Neural Network (Animated Introduction)

Neural Networks - Introduction to the Maths Behind

T33.2: A Neural Network Model of Continual Learning with Cognitive Control

Physics Informed Neural Networks (PINNs) [Physics Informed Machine Learning]

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

NeuralNet Package - Neural Networks for Business Analytics with R (Rstudio) 25/33

Recurrent Neural Networks | RNN LSTM Tutorial | Why use RNN | On Whiteboard | Compare ANN, CNN, RNN