Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 3 - Backprop and Neural Networks

preview_player
Показать описание

This lecture covers:
1. Why Python?
2. Setup
3. Language Basics
4. Introduction to NumPy
5. Practical Python Tips
6. Other Great References

Professor Christopher Manning
Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science
Director, Stanford Artificial Intelligence Laboratory (SAIL)
Рекомендации по теме
Комментарии
Автор

🎯 Key Takeaways for quick navigation:

00:00 The *lecture focuses on the math details of neural net learning, introducing the backpropagation algorithm. Assignment 2 is released, emphasizing understanding the math of neural networks and associated software. The speaker discusses the struggle some students may face but encourages grasping the math behind neural networks for a deeper understanding. After this week, the course will transition to using software for complex math, and a tutorial on PyTorch is announced. The lecture introduces a named entity recognition task using a simple neural network and outlines the mathematical foundations for training neural nets through manual computation and backpropagation.*
27:56 Understanding *the process of calculating partial derivatives for a neural network involves breaking down complex functions into simpler ones and using the chain rule.*
32:40 To *efficiently train neural networks, the backpropagation algorithm is used to propagate derivatives using the matrix chain rule, reusing shared derivatives to minimize computation.*
37:22 In *neural network computations, the shape convention is employed, representing gradients in the same shape as parameters for efficient updates using stochastic gradient descent.*
43:58 There *is a discrepancy between the Jacobian form, useful for calculus, and the shape convention used for presenting answers in assignments to ensure correct shapes for gradient updates.*
48:10 Computation *graphs, resembling trees, are constructed to systematically exploit shared derivatives, aiding efficient backpropagation in neural network training.*
50:56 Backpropagation *involves passing gradients backward through the computation graph, updating parameters using the chain rule to minimize loss in neural network training.*
53:21 The *general principle in backpropagation is that the downstream gradient equals the upstream gradient times the local gradient, facilitating efficient parameter updates.*
57:32 During *backpropagation, local gradients are computed for each node, and the downstream gradient is calculated by multiplying the upstream gradient with the local gradient, enabling efficient gradient updates for parameters.*
59:18: Understanding *gradients helps assess the impact of variable changes on output in a computation graph.*
59:45: Changes *in z don't affect output, so gradient df/dz is 0; Changes in x affect output twice as much, with df/dx = 2.*
01:00:12: Calculating *gradients when there are multiple branches in the computation graph involves summing gradients.*
01:01:36: With *multiple outward branches, gradient calculation involves summing gradients using the chain rule.*
01:04:54: Backpropagation *avoids redundant computation, calculating gradients efficiently by systematically moving forward and backward through the graph.*
01:07:45: The *backpropagation algorithm applies to arbitrary computation graphs, but neural networks with regular layer structures allow for parallelization.*
01:11:52: Automatic *differentiation in modern deep learning frameworks like TensorFlow and PyTorch handles most gradient computation automatically.*
01:13:42: Symbolic *computation for derivatives, attempted in Theano, faced challenges, leading to the current approach in deep learning frameworks.*
01:14:36: Automatic *differentiation involves forward and backward passes, computing values and gradients efficiently through the computation graph.*
01:18:47: Numeric *gradient checking is a simple but slow method to verify gradient correctness, mainly used when implementing custom layers.*

Made with HARPA AI

AramR-mw
Автор

at 23:29 the reason for non diagonal of jacobian =0 are a little unclear.
The derivative at non diagonal positions are zero because h1 is output for input z1 and no other input.
Hence h1 would change dh1/dz1 for change in z1 and 0 for change in others i.e. z2, z3

manujarora
Автор

53:00 left-back-grad is interX-grad * right-back-grad

annawilson
Автор

At 40:26 there is a typo in the presentation (page 40). The result of the sum \sum W_{ik} x_k should be z_j rather than x_j

ramongarcia
Автор

may i get the tutorial of pyTorch mentioned at 4:35 from somewhere?

GengyinLiu
Автор

Question: How is f(z) Jacobian?
My understanding: For a single neuron z is going to be a scalar.
and f(z) output is also going to be scalar for a neuron.

Can a Neuron ever output anything other than a scalar? Perhaps the jacobian holds for the overall network

manujarora
Автор

guess im not really missing out on something not being in stanford

susdoge