Ray: A Framework for Scaling and Distributing Python & ML Applications

preview_player
Показать описание
Recording of a live meetup on Feb 16, 2022 from our friends at Data + AI Denver/Boulder meetup group.

Meetup details:

Our first talk of the year features Jules Damji, Lead Developer Advocate at Anyscale as he discusses Ray: A Framework for Scaling and Distributing Python & ML Applications.

ABOUT THE TALK:

Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments.

This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem.

Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray.

The takeaways from this talk are:

Learn Ray architecture, core concepts, and Ray primitives and patterns
Why Distributed computing will be the norm not an exception
How to scale your ML workloads with Ray libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Hyperparameter search and tuning, using XGBoost with Ray and Ray Tune
Inferencing at scale, using XGBoost with/without Ray

ABOUT OUR SPEAKER:

Our Speaker, Jules Damji is the Lead Developer Advocate at Anyscale Inc.

He is an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

Рекомендации по теме
Комментарии
Автор

Can you pls share jupyter notebook used in this ppt? Would be very helpful

madhuful
Автор

Informative, thanks much.. How to deploy this ray data code into ray cluster in kubernetes ?

sivasankarir
Автор

How is Ray different from Nvidia triton server?

ameynaik
Автор

Great video! Someone is having lunch in the same room?

feifeizhang
Автор

Now Amazon is moving to ray.. when I just started with spark😢

vam
Автор

We have used both Ray and spark for one of our projects. Ray is awesome, however, I find spark is more robust compared to Ray.

sandyjust