Build Large-Scale Data Analytics and AI Pipeline Using RayDP

Показать описание

A large-scale end-to-end data analytics and AI pipeline usually involves data processing frameworks such as Apache Spark for massive data preprocessing, and ML/DL frameworks for distributed training on the preprocessed data. A conventional approach is to use two separate clusters and glue multiple jobs. Other solutions include running deep learning frameworks in an Apache Spark cluster, or use workflow orchestrators like Kubeflow to stitch distributed programs. All these options have their own limitations. We introduce Ray as a single substrate for distributed data processing and machine learning. We also introduce RayDP which allows you to start an Apache Spark job on Ray in your python program and utilize Ray’s in-memory object store to efficiently exchange data between Apache Spark and other libraries. We will demonstrate how this makes building an end-to-end data analytics and AI pipeline simpler and more efficient.

Connect with us:

Рекомендации по теме

Комментарии

I cannot make this run on Databricks. It keeps saying: "java gateway process exited before sending its port number databricks"
Is there any config that I need to do?

DZitLee

Build Large-Scale Data Analytics and AI Pipeline Using RayDP

Build Large-Scale Data Analytics and AI Pipeline Using RayDP

RayDP: Build Large-scale End-to-end Data Analytics and AI Pipelines Using Spark and Ray

What is Data Pipeline? | Why Is It So Popular?

Large Scale User Behavior Analytics by Flink - Hao Wu (HanSight)

Build large scale near real time analytical solutions to accelerate the digital - BRK3099

Data modeling, the secret sauce of building & managing a large scale data warehouse | Citus Con ...

Data Pipeline Overview

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Beyond RAG: Production-ready AI agents powered by enterprise-scale data

Introduction to large scale data analytics and interactive visualization - Christine Doig

Building Enterprise Scale Data and Analytics Platforms at Amgen

Google I/O 2011: Large-scale Data Analysis Using the App Engine Pipeline API

Build Production Data Pipelines at Scale with Accelerated Spark On PremisesSumit Gupta IBM

Database vs Data Warehouse vs Data Lake | What is the Difference?

Building a Large Scale Recommendation Engine with Spark and Redis ML - Shay Nativ

Building Complex Data Analytics Pipelines with Ray - Qingqing Mao, Dascena

The phData Perspective On Large-Scale Implementations Of AI, ML and Data Analytics

Miguel Martínez & Meriem Bendris - Building Large-scale Localized Language Models

Data Pipelines Explained

Data Analysis Project | Large Scale Data Analysis | Switching from Pandas to FireDucks

Building real-time big data analytics solutions

Introducing Glow: An Open-Source Toolkit for Large-Scale Genomic Analysis

Large Scale Data Visualisation with Deck.gl and Shiny

Big Data In 5 Minutes | What Is Big Data?| Big Data Analytics | Big Data Tutorial | Simplilearn