filmov
tv
High Performance Hardware for Distributed Deep Learning

Показать описание
In this video from Switzerland HPC Conference, Gaurav Kaul from Intel presents: High Performance Hardware for Distributed Deep Learning – System Benchmarking, Performance Optimization and Architecture for Scalable Systems.
"With the recent success of Deep Learning and related techniques, we are beginning to see new specialized hardware or extensions to existing architectures dedicated to making training and inference computations faster or energy efficient or both. These technologies use either traditional CMOS technology on conventional von Neumann architectures such as CPUs or accelerators such as DSPs, GPUs, FPGAs, and ASICs or other novel and exotic technologies in research phase such as neuromorphic computing. The overarching goal being to address a specific tradeoff in mapping machine learning algorithms in general and deep learning in particular, to a specific underlying hardware technology. Conversely, there has been quite an effort on the empirical side at devising deep network architectures for efficient implementation on these novel hardware architectures. This also has implications on leveraging appropriate hardware technology for inferencing primarily with energy and latency as the primary design goals. These efforts are finding some traction in the computer architecture community to look at effective building blocks for mapping these deep neural networks to appropriate processing elements (existing or new) and code optimization techniques for existing architectures. This talk aims to tie these seemingly disparate themes of co-design, Neural network architecture, algorithms and system architecture and bring together researchers at the interface of machine learning, hardware implementation, and systems for discussing the state of the art and the state of the possible.
The talk will focus on the following themes and present work done on related to
● How deep learning computations and algorithms are mapped and co-designed with new processing and interconnects technologies as the Intel Xeon Phi and Intel Omni-Path fabric
● Evaluate the different tradeoffs in accuracy, computational complexity, hardware cost, energy efficiency and application throughput currently investigated in these approaches.
Gaurav Kaul is a systems architect at Intel Corporation. He works extensively with life science customers in Europe and Middle East in designing and deployment of computing infrastructure for analysis of genomic data.
and
"With the recent success of Deep Learning and related techniques, we are beginning to see new specialized hardware or extensions to existing architectures dedicated to making training and inference computations faster or energy efficient or both. These technologies use either traditional CMOS technology on conventional von Neumann architectures such as CPUs or accelerators such as DSPs, GPUs, FPGAs, and ASICs or other novel and exotic technologies in research phase such as neuromorphic computing. The overarching goal being to address a specific tradeoff in mapping machine learning algorithms in general and deep learning in particular, to a specific underlying hardware technology. Conversely, there has been quite an effort on the empirical side at devising deep network architectures for efficient implementation on these novel hardware architectures. This also has implications on leveraging appropriate hardware technology for inferencing primarily with energy and latency as the primary design goals. These efforts are finding some traction in the computer architecture community to look at effective building blocks for mapping these deep neural networks to appropriate processing elements (existing or new) and code optimization techniques for existing architectures. This talk aims to tie these seemingly disparate themes of co-design, Neural network architecture, algorithms and system architecture and bring together researchers at the interface of machine learning, hardware implementation, and systems for discussing the state of the art and the state of the possible.
The talk will focus on the following themes and present work done on related to
● How deep learning computations and algorithms are mapped and co-designed with new processing and interconnects technologies as the Intel Xeon Phi and Intel Omni-Path fabric
● Evaluate the different tradeoffs in accuracy, computational complexity, hardware cost, energy efficiency and application throughput currently investigated in these approaches.
Gaurav Kaul is a systems architect at Intel Corporation. He works extensively with life science customers in Europe and Middle East in designing and deployment of computing infrastructure for analysis of genomic data.
and
Комментарии