Deep Learning Determinism

preview_player
Показать описание
This is a presentation was given on March 20, 2019 at GTC in San Jose, California.

Some items covered:
* What non-determinism is in the context of deep learning
* Why it's important to achieve deterministic operation
* The most common sources of non-determinism on GPUs
* An methodology for debugging non-determinism
* A tool for debugging non-determinism in TensorFlow
* Solutions to make frameworks operate deterministically on GPUs

Links from the talk:

The TensorFlow determinism debug tool will be open-sourced at the following URL. Updates to the content of the talk will also be released there. Please watch and/or follow the repository.

Accompanying poster (presented at ScaledML 2019 at the Computer History Museum, Mountain View, CA):

This was my first public tech talk. I wrote about what I learned in preparing for it here:

This video (S9911) and many others from the conference can be viewed (for free) at
Рекомендации по теме
Комментарии
Автор

Thank you very much Duncan!
So good to learn from someone who actually knows what he's talking about.
Deterministic results are really important to us, and your solution worked like a charm.

artemS
Автор

5 years later, no major models, not even Llama2 support even inference determinism. What happened?

PaulSlusarz
Автор

Thank you so much. Excellent work and presentation.

PhilipTeare
Автор

Do you see any cases/models other than multigpu where nondeterminism gives a substantial performance bump for inference?

yomanwhatstheplan
Автор

Excellent presentation. I found it hard to believe that there is no conclusive answer to the question 'is determinism possible".

Now regarding the sentence in 26:30 : "being able to do hyper parameter tuning". This is a common mis-interpretation: training determinism in general cannot help with hyperparamers stability. It can only help when a certain specific hyperparameter is known to be decouple from the a-determinism source. In general it may only hide the real problem.


Say you want to decide between two convolution hyperparameters. And say that the one is better than the other. For example error rate =0.7% +- 0.2% for one parameter value and 0.8% +-0.2% for the other. In ensemble training the first is better than the second by 0.1%, but on any one pair of runs the second may produce lower error then the first by up to 0.1%. Setting the random seed to be have same constant number cannot imply the difference between the runs will equal the mean error difference of 0.1%.

etzioni