CI CD for Data Lakes part 2

Показать описание

Today, data lakes offer many advantages for cloud users. Everyone who needs storage such as ObjectStore, S3, Azure Blob, and others, can leverage data lakes. They are scalable, cost-effective, relatively easy to use, and have high throughput and a rich application ecosystem. Yet, with data-intensive systems that are a combination of OSS and cloud native services, there are also challenges. As data practitioners, it's tough to experiment, compare, and reproduce data-intensive transactions. Copying large-scale data for experimentation gets pricey. On top of the expense is the difficulty of enforcing data best practices like schema, since it can change on the fly when ingesting data from outside sources. Lastly, it’s also hard to ensure high-quality data. To start working on solutions to these problems, it’s necessary to acknowledge that our systems are made up of data and code. There are tools such as Git and CI/CD to manage code. Why not apply the same logic to data? With open source tools like lakeFS, it’s possible to manage data at scale using Git-like capabilities.

LearntheGame

Рекомендации по теме

CI CD for Data Lakes part 2

Automated Metadata Management in Data Lake – A CI/CD Driven Approach

Rethinking Ingestion: CI/CD for Data Lakes by Einat Orr

CI/CD For Data – Building Data Development Environments With Open Source Stacks

CI/CD for Data - Building Data Development Environment with lakeFS

CI/CD for Data - Building Dev/Test Data Environments with Open Source Stacks - Vinodhini Duraisamy

CI CD for Data Lakes part I

Making Data Changes and Repair Safe and Easy - CI/CD on the Data Lakehouse

Integrate Autonomous Database into your Azure DevOps and Data Lake CI/CD workflows

#FabricCoffee - Kevin Chant -Microsoft Fabric and Azure DevOps - The Story So Far

Webinar: Accelerate CI/CD for Data Pipelines with Testing Automation

Interview question : What is CI CD in Spark Databricks?

CI/CD for Data - how to enhance data quality while increasing data engineering velocity

Prod is the New Dev: Using Nessie to Build the Definitive CI/CD for Data

Data Versioning and CI/CD in data engineering

CI CD for Data Lakes part 2

What is Data Pipeline? | Why Is It So Popular?

VisugXL - Koen Rottiers - CI/CD for a Data Platform: How to enable consistent data pipelines

Complete Azure Data Factory CI/CD Process (DEV/UAT/PROD) with Azure Pipelines

ACD22-1-13 CI CD for data - Building Test Data Environments with lakeFS - Vinodhini SD - ACDBA 2022

Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runtastic

Level Up Your Synapse Game - CI/CD Automation for a Faster, More Reliable ELT Process

AWS re:Invent 2020: How to build a test framework for your data lake

CI/CD for data projects: Why manual deployments are not good enough

CI CD for Data Building Dev Test Data Environments with Open Source Stacks Vinodhini Duraisamy