filmov
tv
CAII 11/8 Seminar Featuring MIT Theoretical Physics Researcher Dan Roberts
Показать описание
Our Center for Artificial Intelligence Innovation continues their 2021 Fall Seminar series with this talk by Dan Roberts, Research Affiliate, Center for Theoretical Physics, at the @mit, in a presentation titled "The Principles of Deep Learning Theory."
Abstract:
Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to put forth a set of principles that enable us to theoretically analyze deep neural networks of actual relevance. In doing so, we will explain why such a goal is even attainable in theory and how we are able to get there in practice.
To begin, we will discuss how physical intuition and the approach of theoretical physics can be brought to bear on this problem, borrowing from the "effective theory" framework of physics. For context, we will recount how similar ideas were used to connect the thermodynamic effective description of artificial machines from the industrial age to the first-principles theory of microscopic components provided by statistical mechanics. In order to make progress on deep learning, we will need to understand the statistics of initialized deep networks and determine the dynamics of such an ensemble when learning from data. To make this tractable, we will have to take the structure of neural networks into account. Developing a perturbative 1/n expansion around the limit of infinite hidden-layer width, we will find a principle of sparsity that will let us describe effectively-deep networks of practical large-but-finite-width networks. We will thus see that useful neural networks should be sparse -- hence the preference for larger and larger models -- but not too sparse -- so that they are also deep.
This talk is based on a book, "The Principles of Deep Learning Theory," co-authored with Sho Yaida and based on research also in collaboration with Boris Hanin. It will be published next year by Cambridge University Press.
Abstract:
Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to put forth a set of principles that enable us to theoretically analyze deep neural networks of actual relevance. In doing so, we will explain why such a goal is even attainable in theory and how we are able to get there in practice.
To begin, we will discuss how physical intuition and the approach of theoretical physics can be brought to bear on this problem, borrowing from the "effective theory" framework of physics. For context, we will recount how similar ideas were used to connect the thermodynamic effective description of artificial machines from the industrial age to the first-principles theory of microscopic components provided by statistical mechanics. In order to make progress on deep learning, we will need to understand the statistics of initialized deep networks and determine the dynamics of such an ensemble when learning from data. To make this tractable, we will have to take the structure of neural networks into account. Developing a perturbative 1/n expansion around the limit of infinite hidden-layer width, we will find a principle of sparsity that will let us describe effectively-deep networks of practical large-but-finite-width networks. We will thus see that useful neural networks should be sparse -- hence the preference for larger and larger models -- but not too sparse -- so that they are also deep.
This talk is based on a book, "The Principles of Deep Learning Theory," co-authored with Sho Yaida and based on research also in collaboration with Boris Hanin. It will be published next year by Cambridge University Press.