Next-Gen Data Modeling, Integrity, and Governance with YODA

Показать описание

Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.

The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.

This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.

To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.

Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.

ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.

EPISODE LINKS

TIMESTAMPS
0:00 - Intro
2:29 - What is Yotpo?
5:25 - Building an ETL framework based on Spark
10:18 - What is Apache Spark?
15:40 - Decoupling the data model
18:51 - Using data mesh principles
22:24 - How to address different data personas
26:35 - What is the "shift left" movement?
28:47 - How can organizations change the way they treat their data?
31:01 - Use-cases for tooling and documenting data sets
32:07 - Schema vs. schema-less
40:07 - What is YODA?
48:35 - Takeaways from the conversation with Doron and Liran
52:45 - It's a wrap!

ABOUT CONFLUENT

#datapipeline #apachekafka #kafka #streamprocessing #microservices #confluent

Рекомендации по теме

Комментарии

Hi, any ETA on if/when YODA might be open sourced?

ledkj

Next-Gen Data Modeling, Integrity, and Governance with YODA

Next-Gen Data Modeling, Integrity, and Governance with YODA

Optimizing Data Models for Speed #ai #artificialintelligence #machinelearning #aiagent #Optimizing

Ensuring Data Consistency and Integrity #ai #artificialintelligence #machinelearning #aiagent

Real Time Power BI Project, Blinkit Analysis #powerbi #powerbidashboard #dataanalyst

To Surrogate Key or Not...

Best Practices for Data Validation and Consistency #ai #artificialintelligence #machinelearning

Understanding Model Context Protocols #ai #artificialintelligence #machinelearning #aiagent

7 Database Design Mistakes to Avoid (With Solutions)

Overview of Model Context Protocol #ai #artificialintelligence #machinelearning #aiagent #Overview

Segmentation Techniques for Long Inputs #ai #artificialintelligence #machinelearning #aiagent

Registers: Systems of Record with Guaranteed Integrity - Philip Potter

Data Empowerment: Accelerate the Success of Your Modern Data Platform: CBConnect21

Challenges of Long Input Processing #ai #artificialintelligence #machinelearning #aiagent

Next Generation Data Platform

Implementing the Test Using Common ML Tools #ai #artificialintelligence #machinelearning #aiagent

Dealing with Outliers and Anomalies #ai #artificialintelligence #machinelearning #aiagent #Dealing

Data Integrity for AI Projects

Real-World Application: Healthcare Data #ai #artificialintelligence #machinelearning #aiagent

Data Encryption Techniques for Edge AI #ai #artificialintelligence #machinelearning #aiagent #Data

Using Synthetic Data: SMOTE and Beyond #ai #artificialintelligence #machinelearning #aiagent #Using

📦 Informatica CDQ | Lab Access & Secure Agent Setup | Practical Session

40 Real Data Architect Interview Questions & Answers - Part I

📝🧠Give using sticky notes a try for creating mind maps! #shorts

TOP 4 INTERVIEW QUESTIONS & ANSWERS! (How to ANSWER COMMON Interview Questions!) #interviewquest...