Distributed TensorFlow (TensorFlow Dev Summit 2018)

Показать описание

Igor Saprykin offers a way to train models on one machine and multiple GPUs and introduces an API that is foundational for supporting other configurations in the future.

event: TensorFlow Dev Summit 2018; re_ty: Publish; product: TensorFlow - General; fullname: Igor Saprykin; event: TensorFlow Dev Summit 2018;

Рекомендации по теме

Комментарии

Thanks for including links to articles in your talk. I was especially happy to learn about your future plans with distributed TensorFlow and Horovod. Thank you for mentioning that.

DavidEllisonPhD

I wabted to see on example for mirrored strategy for say imagenet. R there any links

ashokvasthav

Does someone have information/links on how the inter-node communication works? MPI? NCCL? Both? Were these results obtained using TCP over LAN or Infiniband/RoCE?

jacojoubert

This is awesome!
Good to hear that it is not tied to estimators. It is very hacky to build reinforcement learning systems with Estimator API since "input" is often generated and depend on previous model output, label could be extracted from internal state (prioritized memory for DQN), and number of steps depend on environment feedback.
Looking forward for this to become available for Graph Session API !

jackshi

Great presentation! Interesting to see a mention of Uber Horovod as a way to embrace the Open Source commitment from Google/TF team.

californiaesnuestra

the best software engineer in the world

aleksandrtsyrulnev

honestly i have little idea how the distributed gradients can be combined together.

clydexu

Distributed TensorFlow (TensorFlow Dev Summit 2018)

Distributed TensorFlow (TensorFlow Dev Summit 2017)

Distributed TensorFlow (TensorFlow Dev Summit 2018)

Distributed TensorFlow model training on Cloud AI Platform (TF Dev Summit '20)

TensorFlow Ecosystem: Integrating TensorFlow with your infrastructure (TensorFlow Dev Summit 2017)

Introducing TensorFlow 2.0 and its high-level APIs (TF Dev Summit '19)

TensorFlow Dev Summit 2018 - Livestream

TensorFlow Dev Summit 2019 Livestream

Integrating Keras & TensorFlow: The Keras workflow, expanded (TensorFlow Dev Summit 2017)

TensorFlow Dev Summit 2020 Livestream

Reconstructing Fusion Plasmas (TensorFlow Dev Summit 2018)

Keynote (TensorFlow Dev Summit 2017)

Training Performance: A user’s guide to converge faster (TensorFlow Dev Summit 2018)

The Practitioner's Guide with TF High Level APIs (TensorFlow Dev Summit 2018)

Sonnet 2.0 (TF Dev Summit ‘19)

TensorFlow Extended (TFX) Overview and Pre-training Workflow (TF Dev Summit '19)

TensorFlow High-Level APIs: Models in a Box (TensorFlow Dev Summit 2017)

TensorFlow Probability: Learning with confidence (TF Dev Summit '19)

Distributed TensorFlow (TensorFlow @ O’Reilly AI Conference, San Francisco '18)

TensorFlow Enterprise (TF Dev Summit '20)

Mesh-TensorFlow: Model Parallelism for Supercomputers (TF Dev Summit ‘19)

ML Toolkit (TensorFlow Dev Summit 2017)

Mobile and Embedded TensorFlow (TensorFlow Dev Summit 2017)

TensorFlow Dev Summit 2019 Keynote

Scaling TensorFlow 2 models to multi-worker GPUs (TF Dev Summit '20)