Liquid Neural Networks

preview_player
Показать описание
Ramin Hasani, MIT - intro by Daniela Rus, MIT

Abstract: In this talk, we will discuss the nuts and bolts of the novel continuous-time neural network models: Liquid Time-Constant (LTC) Networks. Instead of declaring a learning system's dynamics by implicit nonlinearities, LTCs construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. LTCs represent dynamical systems with varying (i.e., liquid) time-constants, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks compared to advance recurrent network models.

Speaker Biographies:

Dr. Daniela Rus is the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science and Director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. Rus’s research interests are in robotics, mobile computing, and data science. Rus is a Class of 2002 MacArthur Fellow, a fellow of ACM, AAAI and IEEE, and a member of the National Academy of Engineers, and the American Academy of Arts and Sciences. She earned her PhD in Computer Science from Cornell University. Prior to joining MIT, Rus was a professor in the Computer Science Department at Dartmouth College.

Dr. Ramin Hasani is a postdoctoral associate and a machine learning scientist at MIT CSAIL. His primary research focus is on the development of interpretable deep learning and decision-making algorithms for robots. Ramin received his Ph.D. with honors in Computer Science at TU Wien, Austria. His dissertation on liquid neural networks was co-advised by Prof. Radu Grosu (TU Wien) and Prof. Daniela Rus (MIT). Ramin is a frequent TEDx speaker. He has completed an M.Sc. in Electronic Engineering at Politecnico di Milano (2015), Italy, and has got his B.Sc. in Electrical Engineering – Electronics at Ferdowsi University of Mashhad, Iran (2012).
Рекомендации по теме
Комментарии
Автор

It just amazes me how the final few layers are so crucial to the objective of the neural network!

adityamwagh
Автор

0:00: 🤖 The talk introduces the concept of liquid neural networks, which aim to bring insights from natural brains back to artificial intelligence.
- 0:00: The speaker, Daniela Rus, is the director of CSAIL and has a curiosity to understand intelligence.
- 2:33: The talk aims to build machine learned models that are more compact, sustainable, and explainable than deep neural networks.
- 3:26: Ramin Hasani, a postdoc in Daniela Rus' group, presents the concept of liquid neural networks and their potential benefits.
- 5:11: Natural brains interact with their environments to capture causality and go out of distribution, which is an area that can benefit artificial intelligence.
- 5:34: Natural brains are more robust, flexible, and efficient compared to deep neural networks.
- 6:03: A demonstration of a typical statistical end-to-end machine learning system is given.

6:44: 🧠 This research explores the attention and decision-making capabilities of neural networks and compares them to biological systems.
- 6:44: The CNN learned to attend to the sides of the road when making driving decisions.
- 7:28: Adding noise to the image affected the reliability of the attention map.
- 7:59: The researchers propose a framework that combines neuroscience and machine learning to understand and improve neural networks.
- 8:23: The research explores neural circuits and neural mechanisms to understand the building blocks of intelligence.
- 9:32: The models developed in the research are more expressive and capable of handling memory compared to deep learning models.
- 10:09: The systems developed in the research can capture the true causal structure of data and are robust to perturbations.

11:53: 🧠 The speaker discusses the incorporation of principles from neuroscience into machine learning models, specifically focusing on continuous time neural networks.
- 11:53: Neural dynamics are described by differential equations and can incorporate complexity, nonlinearity, memory, and sparsity.
- 14:19: Continuous time neural networks offer advantages such as a larger space of possible functions and the ability to model sequential behavior.
- 16:00: Numerical ODE solvers can be used to implement continuous time neural networks.
- 16:36: The choice of ODE solver and loss function can define the complexity and accuracy of the network.

17:07: ✨ Neural ODEs combine the power of differential equations and neural networks to model biological processes.
- 17:07: Neural ODEs use differential equations to model the dynamics of a system and neural networks to model the interactions between different components.
- 17:35: The adjoint method is used to compute the gradients of the loss in respect to the state of the system and the parameters of the system.
- 18:35: Neural ODEs have high memory complexity but are more accurate than the adjoint method.
- 19:17: Neural ODEs can be inspired by the dynamics of biological systems, such as the leaky integrator model and conductance-based synapse model.
- 20:43: Neural ODEs can be reduced to an abstract form with sigmoid activation functions.
- 21:33: The behavior of the neural ODE depends on the inputs of the system and the coupling between the state and the time constant of the differential equation.

22:26: ⚙️ Liquid time constant networks (LTCs) are a type of neural network that uses differential equations to control interactions between neurons, resulting in stable behavior and increased expressivity.
- 22:26: LTCs have the same structure as traditional neural networks but use differential equations to control interactions between neurons.
- 24:25: LTCs have stable behavior and their time constant can be bounded.
- 25:26: The synaptic parameters in LTCs determine the impact on neuron activity.
- 25:50: LTCs are a universal approximator and can approximate any given dynamics.
- 26:23: Trajectory length measure can be used to measure the expressivity of LTCs.
- 27:58: LTCs consistently produce longer and more complex trajectories compared to other neural network representations.

28:46: 📊 The speaker presents an empirical analysis of different types of networks and their trajectory lengths, and evaluates their expressivity and performance in representation learning tasks.
- 28:46: The trajectory length of LTC networks remains higher regardless of changes in network width or initialization.
- 29:04: Theoretical evaluation reveals a lower bound for expressivity of these networks based on weighted scale, biases scale, width, depth, and number of discretization steps.
- 30:38: In representation learning tasks, LTCs outperform other networks, except for tasks with longer term dependencies where LSTMs perform better.
- 31:13: LTCs show better performance and robustness in real-world examples, such as autonomous driving, with significantly reduced parameters.
- 33:09: LTC-based networks impose an inductive bias on convolutional networks, allowing them to learn a causal structure and exhibit better attention and robustness to perturbations.

34:22: ⚙️ Different neural network models have varying abilities to learn representations and perform in a causal manner.
- 34:22: The CNN consistently focuses on the outside of the road, which is undesirable.
- 34:31: LSTM provides a good representation but is sensitive to lighting conditions.
- 34:39: CTRNN or neural ODEs struggle to gain a nice representation in this task.
- 36:07: Physical models described by ODEs can predict future evolution, account for interventions, and provide insights.
- 38:36: Dynamic causal models use ODEs to create a graphical model with feedback.
- 39:55: Liquid neural networks can have a unique solution under certain conditions and can compute coefficients for causal behavior.

40:18: 🧠 Neural networks with ODE solvers can learn complex causal structures and perform tasks in closed loop environments.
- 40:18: Dynamic causal models with parameters B and C control collaboration and external inputs in the system.
- 41:12: Experiments with drone agents showed that the neural networks learned to focus on important targets.
- 41:58: Attention and causal structure were captured in both single and multi-agent environments.
- 43:05: The success rate of the networks in closed loop tasks demonstrated their understanding of the causal structure.
- 43:46: Complexity of the networks is tied to the complexity of the ODE solver, leading to longer training and test times.
- 44:53: The ODE-based networks may face vanishing gradient problems, which can be mitigated with gating mechanisms.

45:41: 💡 Model-free inference and liquid networks have the potential to enhance decision-making and intelligence.
- 45:41: Model-free inference captures temporal aspects of tasks and performs credit assignment better.
- 45:53: Liquid networks with causal structure enable generative modeling and further inference.
- 46:32: Compositionality and differentiability make these networks adaptable and interpretable.
- 46:40: Adding CNN heads or perception modules can handle visual or video data.
- 48:09: Working with objective functions and physics-informed learning processes can enhance learning.
- 49:02: Certain structures in liquid networks can improve decision-making for complex tasks.

Recap by Tammy AI

marcc
Автор

Fantastic work. The relative simplicity of the model proves that this methodology is truly a step towards artificial brains. Expressivity, better causality and the many neuron inspired improvements are inspiring.

FilippoMazza
Автор

Sounds like an important and necessary evolution of ML. Lets see how much this can be generalized and scaled but sounds fascinating.

martinsz
Автор

I prefer my Neural Networks solid thank you very much

isaacgutierrez
Автор

Most underrated talk. This is an actual game changer for ML.

hyperinfinity
Автор

The "discovery" that fixed time steps for ODE work better in this case is very well known in the optimal control literature (at least by a couple of decades).
Basically if your ODE solver has adaptive time steps, the exact mathematical operations performed for a given integration time interval dT can vary because a different number of internal steps is performed. This can have really bad consequences on the gradients of the final time states.
There's plenty of theoretical and practical discussion in Betts' book Practical Methods for Optimal Control, chapter 3.9 Dynamic Systems Differentiation.

lorenzoa.ricciardi
Автор

i hope that one day i'll be able to fully understand what he's talking about... but it sounds amazing and i want to play around with it!

scaramir
Автор

This is truly a game changer in AI, well done folks 👍

agritech
Автор

I would need a simpler version of the 22:49 diagram to understand this.
Ramin says here that standard NN neurons have a recursive connection to themselves. I don't know a ton about ANNs, but I overhear from my coworkers, and I never heard of that recursive connection. Is that for RNNs?

Is there a "Reaching 99% on MNIST"-simple explanation, or does this liquidity only work on time-series data?

zephyr
Автор

The video showed up in my feed randomly and I clicked on it just cause the lecturer was Iranian. But the content was so interesting that I watched it to the end. Sounds like a really evolutionary breakthrough in ML and DL. Specially with the computational power of computing systems growing every day, training/inferencing such complex network models become more possible. Great job

Ali-wfef
Автор

I got so happy finding out the person who wrote this exciting paper, is also a Ramin :)

raminkhoshbin
Автор

What is the difference betweem these LNNs and coupled ODEs? Aren't we conflating these terms. If you drive a car with only 19, then what you have is an asynchronous network of coupled ODEs and not a neural network, the term is misleading.

AndyBarbosa
Автор

Casualty is extremely important in building Bayesian model of the world. It allows to identify correlations between events, that are useful to create a-priori statistics for reasoning, because we avoid double-counting. Single evidence and its logical consequences is not seen as many independent confirmations of hypothesis.

peceed
Автор

I am moved beyond description.
What an amazing privilege to be alive in this day and age.
The future will be great for mankind.

Seekerofknowledges
Автор

very good, the idea for that is simple, the problem relay in puting all of it to work together, that is the good stuff

KeviPegoraro
Автор

Thanks Ramin and team. That was the most interesting and well delivered presentation on neural nets that I have ever seen, certainly a lot new to learn in there. Most impressed by the return to learning from nature and the brain and how that significantly augmented 'standard' RNNs etc. Well, there's a new standard now, and it's liquid.

alwadud
Автор

Attention map at 7:00 looks fine to me. If you do not want to wander off out of the road, you should attend to the boundary of the road. And even after you add noise at 7:30, the attention still picks up the boundary which is pretty good.

ibraheemmoosa
Автор

does anyone know about any sample code for a model like this?

johnniefujita
Автор

Great work, i hope that they publish the article of this model soon.

araneascience