Deep Learning Determinism

Показать описание

This is a presentation was given on March 20, 2019 at GTC in San Jose, California.

Some items covered:
* What non-determinism is in the context of deep learning
* Why it's important to achieve deterministic operation
* The most common sources of non-determinism on GPUs
* An methodology for debugging non-determinism
* A tool for debugging non-determinism in TensorFlow
* Solutions to make frameworks operate deterministically on GPUs

Links from the talk:

The TensorFlow determinism debug tool will be open-sourced at the following URL. Updates to the content of the talk will also be released there. Please watch and/or follow the repository.

Accompanying poster (presented at ScaledML 2019 at the Computer History Museum, Mountain View, CA):

This was my first public tech talk. I wrote about what I learned in preparing for it here:

This video (S9911) and many others from the conference can be viewed (for free) at

Рекомендации по теме

Комментарии

Thank you very much Duncan!
So good to learn from someone who actually knows what he's talking about.
Deterministic results are really important to us, and your solution worked like a charm.

artemS

5 years later, no major models, not even Llama2 support even inference determinism. What happened?

PaulSlusarz

Thank you so much. Excellent work and presentation.

PhilipTeare

Do you see any cases/models other than multigpu where nondeterminism gives a substantial performance bump for inference?

yomanwhatstheplan

Excellent presentation. I found it hard to believe that there is no conclusive answer to the question 'is determinism possible".

Now regarding the sentence in 26:30 : "being able to do hyper parameter tuning". This is a common mis-interpretation: training determinism in general cannot help with hyperparamers stability. It can only help when a certain specific hyperparameter is known to be decouple from the a-determinism source. In general it may only hide the real problem.

Say you want to decide between two convolution hyperparameters. And say that the one is better than the other. For example error rate =0.7% +- 0.2% for one parameter value and 0.8% +-0.2% for the other. In ensemble training the first is better than the second by 0.1%, but on any one pair of runs the second may produce lower error then the first by up to 0.1%. Setting the random seed to be have same constant number cannot imply the difference between the runs will equal the mean error difference of 0.1%.

etzioni

Deep Learning Determinism

Deep Learning Determinism

Slavoj Žižek on determinism

Probabilistic vs. deterministic models explained in under 2 minutes

Legendary AI Researcher Secret to Mastering Machine Learning

2.4) Deterministic vs Stochastic Gradient Descent

AI and Determinism – Are AI systems inherently deterministic?

AI and Determinism – Are AI systems inherently deterministic?

Advice for machine learning beginners | Andrej Karpathy and Lex Fridman

Are Machine Learning Models Deterministic

Deep Learning Reproducibility with TensorFlow

Determinism - Can Newtonian Physics Predict the Future?

James Walker on the importance of determinism | Why Didn't You Test That? #softwarequality

Genius Machine Learning Advice for 10 Minutes Straight

Determinism in the AI Tech Stack (LLMs): Temperature, Seeds, and Tools

AI Question 3: What is Determinism in AI Models? | AWS AI Practitioner Exam

Stochasticity of Deterministic Gradient Descent

Deterministic Machine Learning with MLflow and mlf-core

Pytorch Quick Tip: Reproducible Results and Deterministic Behavior

(Ep8) Free Will vs. Determinism: Are We More Like AI Than We Think?

Can Free Will Coexist with Determinism?

Determinism explained in 1 minute

🧠🤖 Spinoza, AI, and the Illusion of Free Will 🤯 #aipodcast #aidebate #spinoza #freewill #determinism...

A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosi

MIT 6.S191 (2019): Deep Generative Modeling