Neural Ordinary Differential Equations

Показать описание

Abstract:
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

Authors:
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Yannic Kilcher

Рекомендации по теме

Комментарии

That series of explaining deep learning papers, soo good.

KMoscRD

Cheers! That was an excellent video. Thanks so much for putting it together!

ericstephenvorm

I spent 3 hours yesterday trying to figure out what the hell was happening in this paper, and I wake up to this...THANK YOU

siclonman

Great video thx! I didn't get the part with the encoder, where is the video, you talked about?
I mean the Figure 6, are they supposed to work with NODE or... mhhh...
would love, if somebody could explain it

shorray

please， see the caption of Fig.2 "If the loss depends directly on the state at multiple observation times, the adjoint state must be
updated in the direction of the partial derivative of
the loss with respect to each observation"
why need to add an offset for each observation???

zitangsun

Thank you!
I am trying to understand the implementation in Python, but I am confused about why we still need 2~3 Conv2D layers with activation function... if we consider hidden layers as a continuous function that can be solved by ODE solvers.
Could you please help me with this?

albertlee

Let me try to summarize. Tell me if I understood it right.

There is a neural network which tries to predict the hidden activations at each layer (in a continuous space) of another neural network.

So, the integral, of the outputs, of this entire neural network, should be the activations of the final layer (x1), of the neural network which we are trying to predict. Similarly, the input should be the initial activations. (x0)

Therefore, loss is the deviation from ground truth and integration of the first neural network from x0 to x1.

The integration is done through some numerical ODE solver like Euler method. It must be continuous and differentiable.

t is a hyperparameter which is an arbitrarily chosen "depth" of the neural network which we are trying to predict.

herp_derpingson

Amazing video.. new subscriber for sure

moormanjean

Best explanation I've found so far on this. Good job!

nathancooper

After reading and watching various articles and videos, I must say this is the clearest explanation I've found so far. Thanks!

jordibolibar

So clear now, many thanks ! +1 follower

DonHora

wow this seems by far the most distinctive type of network in deep learning. everything else kind of falls into a few categories, but can all be conceptually interconnected in some way. this is not even close

cw

Thanks for making this video. This was really helpful

SuperSarvagya

Really great explanation, very clear and concise.

chrissteel

This is my video of the year, thanky you for the explanation.

zyadh

Hay guys can anyone help me to write research proposal on ODEs topics

iqraiqra

I could not understand why do we need to compute dL/dZ(0), don't we need just dL/d{theta}, for updating our parameters. I would appreciate if anybody could answer my query.

alekhmahankudo

Ok.. and how to find the adjoin equation? What is it and what does it mean and why we can do it?

ClosiusBeg

Hi, Thanks for the crisp explanation. However, Is there any forum or link which I can join for ODE related issues/tasks? Actually, I have just started working on ODEs and would appreciate some help or discussions related to the topic. Thanks!

hdgdhdhdh

How does this relate to liquid neural networks? That paper is also worthy of a video from you I think

Alex-rtpo

Neural Ordinary Differential Equations