Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks

Показать описание

This lecture covers:
1. Why Python?
2. Setup
3. Language Basics
4. Introduction to NumPy
5. Practical Python Tips
6. Other Great References

Professor Christopher Manning
Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science
Director, Stanford Artificial Intelligence Laboratory (SAIL)

Рекомендации по теме

Комментарии

🎯 Key Takeaways for quick navigation:

00:00 The *lecture focuses on the math details of neural net learning, introducing the backpropagation algorithm. Assignment 2 is released, emphasizing understanding the math of neural networks and associated software. The speaker discusses the struggle some students may face but encourages grasping the math behind neural networks for a deeper understanding. After this week, the course will transition to using software for complex math, and a tutorial on PyTorch is announced. The lecture introduces a named entity recognition task using a simple neural network and outlines the mathematical foundations for training neural nets through manual computation and backpropagation.*
27:56 Understanding *the process of calculating partial derivatives for a neural network involves breaking down complex functions into simpler ones and using the chain rule.*
32:40 To *efficiently train neural networks, the backpropagation algorithm is used to propagate derivatives using the matrix chain rule, reusing shared derivatives to minimize computation.*
37:22 In *neural network computations, the shape convention is employed, representing gradients in the same shape as parameters for efficient updates using stochastic gradient descent.*
43:58 There *is a discrepancy between the Jacobian form, useful for calculus, and the shape convention used for presenting answers in assignments to ensure correct shapes for gradient updates.*
48:10 Computation *graphs, resembling trees, are constructed to systematically exploit shared derivatives, aiding efficient backpropagation in neural network training.*
50:56 Backpropagation *involves passing gradients backward through the computation graph, updating parameters using the chain rule to minimize loss in neural network training.*
53:21 The *general principle in backpropagation is that the downstream gradient equals the upstream gradient times the local gradient, facilitating efficient parameter updates.*
57:32 During *backpropagation, local gradients are computed for each node, and the downstream gradient is calculated by multiplying the upstream gradient with the local gradient, enabling efficient gradient updates for parameters.*
59:18: Understanding *gradients helps assess the impact of variable changes on output in a computation graph.*
59:45: Changes *in z don't affect output, so gradient df/dz is 0; Changes in x affect output twice as much, with df/dx = 2.*
01:00:12: Calculating *gradients when there are multiple branches in the computation graph involves summing gradients.*
01:01:36: With *multiple outward branches, gradient calculation involves summing gradients using the chain rule.*
01:04:54: Backpropagation *avoids redundant computation, calculating gradients efficiently by systematically moving forward and backward through the graph.*
01:07:45: The *backpropagation algorithm applies to arbitrary computation graphs, but neural networks with regular layer structures allow for parallelization.*
01:11:52: Automatic *differentiation in modern deep learning frameworks like TensorFlow and PyTorch handles most gradient computation automatically.*
01:13:42: Symbolic *computation for derivatives, attempted in Theano, faced challenges, leading to the current approach in deep learning frameworks.*
01:14:36: Automatic *differentiation involves forward and backward passes, computing values and gradients efficiently through the computation graph.*
01:18:47: Numeric *gradient checking is a simple but slow method to verify gradient correctness, mainly used when implementing custom layers.*

Made with HARPA AI

AramR-mw

at 23:29 the reason for non diagonal of jacobian =0 are a little unclear.
The derivative at non diagonal positions are zero because h1 is output for input z1 and no other input.
Hence h1 would change dh1/dz1 for change in z1 and 0 for change in others i.e. z2, z3

manujarora

53:00 left-back-grad is interX-grad * right-back-grad

annawilson

At 40:26 there is a typo in the presentation (page 40). The result of the sum \sum W_{ik} x_k should be z_j rather than x_j

ramongarcia

may i get the tutorial of pyTorch mentioned at 4:35 from somewhere?

GengyinLiu

Question: How is f(z) Jacobian?
My understanding: For a single neuron z is going to be a scalar.
and f(z) output is also going to be scalar for a neuron.

Can a Neuron ever output anything other than a scalar? Perhaps the jacobian holds for the overall network

manujarora

guess im not really missing out on something not being in stanford

susdoge

Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 1 – Introduction and Word Vectors

Lecture 1 | Natural Language Processing with Deep Learning

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 6 – Language Models and RNNs

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 15 – Natural Language Generation

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 20 – Future of NLP + Deep Learning

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 2 – Word Vectors and Word Senses

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 16 – Coreference Resolution

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 13 – Contextual Word Embeddings

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 19 – Bias in AI

Hướng dẫn trọn bộ CS224n - NLP with Deep Learning - Stanford - Assignment 1 Part 1

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 11 – Convolutional Networks for NLP...

Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 8 - Spanish Loquendo

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 8 – Translation, Seq2Seq, Attention...

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 3 – Neural Networks

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 17 – Multitask Learning

Natural Language Processing in 5minutes! (Stanford nlp)

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 14 – Transformers and Self Attention...

Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 10 - Transformers and Pretraining |ES

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 4 – Backpropagation

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 18 – Constituency Parsing, TreeRNNs...

Stanford CS224N NLP with Deep Learning Winter 2019 Lecture 12 – Subword Models

Lecture 2 | Word Vector Representations: word2vec

What inspired you to study Natural Language Processing? - Chris Manning & Andrew Ng

#LearnAlong Stanford NLP - Lecture 01