Simplifying AI integration on Apache Spark

Показать описание

Spark is an ETL and Data Processing engine especially suited for big data. Most of the time an organization has different teams working on different languages, frameworks and libraries, which needs to be integrated in the ETL Pipelines or for general data processing. For example, a Spark ETL job may be written in Scala by data engineering team, but there is a need to integrate a machine learning solution written in python/R developed by Data Science team. These kinds of solutions are not very straightforward to integrate with spark engine, and it required great amount of collaboration between different teams, hence increasing overall project time and cost. Furthermore, these solutions will keep on changing/upgrading with time using latest versions of the technologies and with improved design and implementation, especially in Machine Learning domain where ML models/algorithms keep on improving with new data and new approaches. And so there is significant downtime involved in integrating the these upgraded version.

In this talk we will discuss about how Informatica integrates AI Solutions as part of data processing pipelines executing on top of Spark along with following major features
1. Data Science team can easily share their AI/ML solutions created using any library, language or framework
2. Shared AI/ML solution can be easily consumed in the spark pipeline.
3. Using Informatica products customers can enjoy drag and drop way of creating the Spark Pipeline with the selected solution(s).
4. Various teams can Continuously Integrate and Deploy (CI-CD) different solutions with minimum down time.

In conclusion, we will understand how different teams (like Data Scientist and Data Engineer) can integrated their work together thereby reducing the time/cost consumed in collaboration.

We will also understand how CI/CD is achieved on spark with minimum downtime while integrating various projects specially AI/ML projects using Informatica products.

Thus, by using these features like drag-and-drop way of creating spark pipeline, easy/minimum collaboration between teams and CI-CD, organizations can drastically reduce overall project completion time and cost.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:

Рекомендации по теме

Комментарии

In nutshell, It's a CI/CD in action with DEI product (with AI Transformation: which handles the lifecycle of AI code)

kanishkachauhan

May I know how we integrate a RL based scheduler to Spark> and also is there any way to submit single node to master node in spark? thank you

UniverseGames

Simplifying AI integration on Apache Spark

Simplifying AI integration on Apache Spark

An AI Powered Chatbot to Simplify Apache Spark Performance Management

(Demo) Generative AI in Spark : Write your Apache Spark code in Natural language using English SDK ?

The English SDK for Apache Spark™

Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction

Vector Databases simply explained! (Embeddings & Indexes)

Apache Kafka in 6 minutes

Kafka in 100 Seconds

Introduction to Apache Ranger

Build a Real Time AI Data Platform with Apache Kafka

Meetup: Simplifying Machine Learning on Big Data with Apache Kylin

How Apache Kafka Revolutionized Real-time Analytics at LinkedIn | 360DigiTMG

Apache Kafka and Machine Learning in Pharma and Life Sciences

The HARDEST part about programming 🤦‍♂️ #code #programming #technology #tech #software #developer...

Project Hydrogen - State of the Art Deep Learning on Apache Spark (Reynold Xin)

Apache Spark and Tensorflow as a Service - Jim Dowling

Democratizing AI with Apache Spark (Ali Ghodsi)

MongoDB in 100 Seconds

Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch

Building Spatial Applications with Apache Spark and CARTO

How We Simplified a Highly Complex and Sensitive Data Stream Using Apache Pulsar

Modular Apache Spark: Transform Your Code in Pieces - Albert Franzi Cros (Schibsted Product & Te...

Apache Singa AI

React Native in 100 Seconds