High-Performance Input Pipelines for Scalable Deep Learning

Показать описание

A production AI system is more than just training a deep learning model. It also includes 1) ingesting and running inference on new data, 2) transformation, processing, and cleaning new data to incorporate it into the training set, 3) continuously retraining to update and continue learning, and 4) experimental pipeline to test improvements to the AI models. This presentation focuses on the importance of high-performance and highly-scalable storage that is needed to take advantage of ever-larger datasets in model training.

We describe the common stages in an input pipeline for deep learning training and describe their resource requirements. We then present a benchmark-based approach for identifying bottlenecks in the pipeline, utilizing the Imagenet dataset to show linear scaling of training performance from 1 GPU to 32 GPUs. The AI-ready infrastructure presented here achieves the goal of providing scalable training performance with simplicity to eliminate the need for complex configuration and tuning of infrastructure components.

Speaker: Joshua Robinson, Founding Engineer, PureStorage

Рекомендации по теме

High-Performance Input Pipelines for Scalable Deep Learning

High-Performance Input Pipelines for Scalable Deep Learning

Writing highly scalable and provenanceable data pipelines by Guilherme Caminha

Best Practices: How To Build Scalable Data Pipelines for Machine Learning

Build Efficient, High-performance Data Pipelines with IICS

AiiDA 1.0: A scalable simulation platform to manage high-throughput high-performance simulations

Building Complex Data Analytics Pipelines with Ray - Qingqing Mao, Dascena

Building Reactive Pipelines: How to Go from Scalable Apps to (Ridiculously) Scalable Systems

A Scalable Artificial Intelligence Data Pipeline (SDC 2019)

Scalable Machine Learning Pipelines with Dask

A visual introduction to modern and scalable machine learning pipeline

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2025)

End to End Machine learning pipelines for Python driven organizations - Nick Harvey

Building Scalable End-To-End Deep Learning Pipelines In The Cloud

Industrial Machine Learning Building Scalable Distributed ML Pipelines

Fast track to modernization: Build next-generation data pipelines with containers

Seamlessly Scaling your ML Pipelines with Ray Serve - Archit Kulkarni

Tam-Sanh Nguyen - Writing and Scaling Collaborative Data Pipelines with Kedro

5 Secrets for making PostgreSQL run BLAZING FAST. How to improve database performance.

LF Live Webinar: Build or Buy? Observability Data Pipelines 101

Surge 2015 - Rajiv Kurian - Scaling Ingest Pipelines with High Performance Computing Principles.

Scaling Python Portable Pipelines in Linkedin

Google I/O 2012 - Building Data Pipelines at Google Scale

'Scaling with Apache Spark (or a lesson in unintended consequences)' by Holden Karau

High-Performance Scalable Deep Learning (and its impact on scientific computing)