Adjoint Sensitivities of a Non-Linear system of equations | Full Derivation

preview_player
Показать описание

The Linear System of Equations is a special case of a non-linear system of equations. Let's use the knowledge we obtained in the previous to also derive the adjoint problem in the more general case.

-------

-------

Timestamps:
00:00 Introduction
01:04 Big Non-Linear Systems
01:57 Scalar-Valued Loss Function
02:36 Parameters involved
03:07 Dimensions
03:56 Total derivative
04:22 Dimensions & row-vector gradients
05:46 Difficult Quantity
06:09 Implicit Differentiation
08:41 Plug back in
09:23 Two ways of bracketing
11:26 Identifying the adjoint
13:24 Adjoint System (is linear)
15:09 Strategy for obtaining the sensitivities
16:52 Remarks
19:52 Comparing against linear systems
21:17 Total and partial derivatives
26:31 Outro
Рекомендации по теме
Комментарии
Автор

My speaking is a little fast in this video. Sorry for that. Will be better again in the next videos :)

You can also set the playback speed to 0.9 if that helps in understanding. Let me know if you had difficulties.

MachineLearningSimulation
Автор

Thank you very much for this video, i'm a bit confused with regarding the complexity between the adjoint method and the naive forward method

My understanding is that the difference lies between the complexity of solving
1) lambda in (df/dx)_T*lambda=-(dJ/dx)_T ([N*N]*[N*1]=[N*1]) and
2) (dx/dtheta) in ([N*N]*[N*P]=[N*P])
Once solved, the remaining part does not differ in complexity

The point I would like to confirm is that
As you mentioned, once x is obtained through 1 forward simulation, we can evaluate df/dx at the point x, solve Eq. 1) and obtain lambda. But isn't that true that theta is also known at this point (otherwise we would not know x)? Which means all dx/dtheta and df/dtheta in Eq. 2) can be evaluated as well? So is the complexity difference really just that solving Eq. 2) requires p times more operation than solving Eq. 1) and the complexity has nothing to do with the number of forward simulation?


Thanks in advance for your answer!

heyjianjing
Автор

Hi there. Enjoy your videos. Just a question: Why u0 depends on theta? It is just for generality? Muchas gracias.

chrtzstn
Автор

Can you maybe also upload a video where you implement this strategy efficiently with AD (Julia or Python would be great)? And maybe also for sensitivities in ODEs, this would be really helpful, at least for me. I have already watched the Python implementation for the simple linear case, which was excellent, however, too simple to understand the generalization to efficiently implement harder real world cases. An extension how to feed an parameter optimization algorithm with the derivates would aswell be really interesting.

mariusa.
Автор

\partial f/ \partial theta seemed very confusing to me. To obtain this term would not be the case to build a tangent linear model?

ZéSilva-fz