Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

preview_player
Показать описание

Abstract:
In recent years, the interest in unsupervised learning of disentangled representations has significantly increased. The key assumption is that real-world data is generated by a few explanatory factors of variation and that these factors can be recovered by unsupervised learning algorithms. A large number of unsupervised learning approaches based on auto-encoding and quantitative evaluation metrics of disentanglement have been proposed; yet, the efficacy of the proposed approaches and utility of proposed notions of disentanglement has not been challenged in prior work. In this paper, we provide a sober look on recent progress in the field and challenge some common assumptions.
We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering the six most prominent methods, and evaluate them across six disentanglement metrics in a reproducible large-scale experimental study on seven different data sets. On the positive side, we observe that different methods successfully enforce properties "encouraged" by the corresponding losses. On the negative side, we observe in our study that well-disentangled models seemingly cannot be identified without access to ground-truth labels even if we are allowed to transfer hyperparameters across data sets. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks.
These results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

Authors:
Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem
Рекомендации по теме
Комментарии
Автор

Disclaimer: This is more an introduction to VAEs and disentanglement and not so much about the experimental part of the paper.

YannicKilcher
Автор

Excellent video as always, thank you.

I know I'm swimming against the tide here, but if features were truly disentangled, we would have far less need for ML in the first place - 'algorithms' can already detect/generate size, color, rotation etc. pretty well. It is when there is entanglement of features that they become so useful. So for me the holy grail isn't having tweakable *independent* properties - e.g. if I turn the dial to 'make the jaw bigger', I want that to affect the mouth and indeed the whole face, but in the right way. As I understood the paper (caveats here!), it shows that you could have jaw size perfectly on a dial, but when you introduce mouth shape that will intertwine with jaw and change it to a new model. Great! Done well (more caveats!) that's what I would want. Maybe disentangled VAEs, but write unpenalized hints right into the latent space (eg measured jaw size) - maybe the model uses this 'free information' to mostly encode that feature there, but the rest of it will still react to and control it. (I'm experimenting with this kind of thing now, and assume I'll run into many of the problems they state in the paper and I don't yet understand - yes it's inefficient, but still a path to understanding :)

Anyway, I've learned a *ton* watching your videos. Obviously I still have a ways to go, but thank you again.

robindebreuil
Автор

I think I’m missing something here. What gets learned? The encoder to vector of means and vars is one thing right? The decoder from the samples to the reconstruction is another, right? Are the distributions learned also while simultaneously encouraging them to be gaussian? Something is being encouraged to be gaussian by the KL term, right? Confused ... 🤯

rockapedra
Автор

Thanks for the video! Was just reading paper

edwardhu