Distributed TensorFlow (TensorFlow Dev Summit 2018)

preview_player
Показать описание
Igor Saprykin offers a way to train models on one machine and multiple GPUs and introduces an API that is foundational for supporting other configurations in the future.

event: TensorFlow Dev Summit 2018; re_ty: Publish; product: TensorFlow - General; fullname: Igor Saprykin; event: TensorFlow Dev Summit 2018;
Рекомендации по теме
Комментарии
Автор

Thanks for including links to articles in your talk. I was especially happy to learn about your future plans with distributed TensorFlow and Horovod. Thank you for mentioning that.

DavidEllisonPhD
Автор

I wabted to see on example for mirrored strategy for say imagenet. R there any links

ashokvasthav
Автор

Does someone have information/links on how the inter-node communication works? MPI? NCCL? Both? Were these results obtained using TCP over LAN or Infiniband/RoCE?

jacojoubert
Автор

This is awesome!
Good to hear that it is not tied to estimators. It is very hacky to build reinforcement learning systems with Estimator API since "input" is often generated and depend on previous model output, label could be extracted from internal state (prioritized memory for DQN), and number of steps depend on environment feedback.
Looking forward for this to become available for Graph Session API !

jackshi
Автор

Great presentation! Interesting to see a mention of Uber Horovod as a way to embrace the Open Source commitment from Google/TF team.

californiaesnuestra
Автор

the best software engineer in the world

aleksandrtsyrulnev
Автор

honestly i have little idea how the distributed gradients can be combined together.

clydexu