filmov
tv
MIA: Cem Anil and James Lucas on provable adversarial robustness; Primer, Roger Grosse
Показать описание
Models, Inference and Algorithms
February 12, 2020
Primer: Enforcing Lipschitz constraints for neural networks
Roger Grosse
Dept. of Computer Science, University of Toronto; Vector Institute
We can understand a lot about a neural network by understanding the Jacobian of the function it computes, i.e. the derivatives of its outputs with respect to its inputs. I’ll explain what the Jacobian is, how it’s built up from the Jacobians of individual layers, and what it tells us about neural net optimization. I’ll then motivate why we might like to bound the matrix norm of the Jacobian, or equivalently, enforce a small Lipschitz constant for a neural net, i.e. ensure that a small change to the input makes a correspondingly small change to the output. This is useful for several reasons: (1) it lets us make the predictions provably robust to small perturbations produced by an adversary, (2) it helps us to estimate the Wasserstein distance between probability distributions, (3) the generalization error can be bounded in terms of the Lipschitz constant, and (4) Lipschitz constraints prevent some optimization difficulties, most notably the problem of exploding gradients. To set the stage for the research talk, I’ll relate the Lipschitz bound of the network to the norms of individual layers’ Jacobians.
Meeting: Efficient Lipschitz-constrained neural networks
Cem Anil
Grosse Group, University of Toronto; Vector Institute
James Lucas
Grosse Group, University of Toronto; Vector Institute
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. We propose two architectural components that satisfy strict Lipschitz constraints with norm preservation. First is the GroupSort activation function, which sorts units within a group. Second is the use of orthogonal linear layers; this is straightforward for fully connected layers, but more involved for convolution layers. We present a flexible and efficient representation of orthogonal convolutions. Our provably Lipschitz-constrained architectures perform competitively at Wasserstein distance estimation and provable adversarial robustness.
Copyright Broad Institute, 2020. All rights reserved.
February 12, 2020
Primer: Enforcing Lipschitz constraints for neural networks
Roger Grosse
Dept. of Computer Science, University of Toronto; Vector Institute
We can understand a lot about a neural network by understanding the Jacobian of the function it computes, i.e. the derivatives of its outputs with respect to its inputs. I’ll explain what the Jacobian is, how it’s built up from the Jacobians of individual layers, and what it tells us about neural net optimization. I’ll then motivate why we might like to bound the matrix norm of the Jacobian, or equivalently, enforce a small Lipschitz constant for a neural net, i.e. ensure that a small change to the input makes a correspondingly small change to the output. This is useful for several reasons: (1) it lets us make the predictions provably robust to small perturbations produced by an adversary, (2) it helps us to estimate the Wasserstein distance between probability distributions, (3) the generalization error can be bounded in terms of the Lipschitz constant, and (4) Lipschitz constraints prevent some optimization difficulties, most notably the problem of exploding gradients. To set the stage for the research talk, I’ll relate the Lipschitz bound of the network to the norms of individual layers’ Jacobians.
Meeting: Efficient Lipschitz-constrained neural networks
Cem Anil
Grosse Group, University of Toronto; Vector Institute
James Lucas
Grosse Group, University of Toronto; Vector Institute
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. We propose two architectural components that satisfy strict Lipschitz constraints with norm preservation. First is the GroupSort activation function, which sorts units within a group. Second is the use of orthogonal linear layers; this is straightforward for fully connected layers, but more involved for convolution layers. We present a flexible and efficient representation of orthogonal convolutions. Our provably Lipschitz-constrained architectures perform competitively at Wasserstein distance estimation and provable adversarial robustness.
Copyright Broad Institute, 2020. All rights reserved.