Let’s Dumb-Proof Data Pipelines

Показать описание

Developing and deploying data pipelines in production is easy. Maintaining data pipelines is hard because most often it’s not the same engineer or team responsible for operating and maintaining data pipelines in production. If your data pipelines are not parameterized and configurable, you need to recompile your source code and go through your release process even for simple configuration changes. Making your data pipelines configurable is not enough. Bad user input can result in many classes of issues such as data loss, data corruption. data correctness, etc.

In this talk, you’ll walk away with techniques to make your data pipelines dumb-proof.
1. Why do you need to make your data pipelines configurable?
2. How to seamlessly promote your data pipelines from one environment to another without making any source code changes?
3. How to reconfigure your data pipelines in production without recompiling the ETL source code?
4. What are the Pros and Cons of using Databricks Notebook widgets for configuring your data pipelines
5. How to externalize configurations from your ETL source code and how to read and parse configuration files
6. Finally, you’ll learn how to take it to next level by leveraging Scala language features, pure config, and typesafe config libraries to achieve boilerplate free configuration code and configuration validations

Connect with us:

Рекомендации по теме

Комментарии

where we can create/keep this config file.in databricks project ?
so when we push project to Devops, the config file will move too?
or manually I have to keep config file in DevOps REPO everytime and changed that config path in my code?

TheVijaynegi

Let’s Dumb-Proof Data Pipelines

Let’s Dumb-Proof Data Pipelines

Code Once Use Often with Declarative Data Pipelines

【New】【Multi Sub】Mystic Game : Billion Virtue Power Up EP1-107 #anime #animation

Data Pipeline Stories (Ahmad Kanani of Siavak)

The Perils of Data Collection - Data Pipeline Collection, Part 1

Empowering Zillow’s Developers with Self-Service ETL

PHP UK Conference 2016 - Samantha Quiñones - Real Time Data Pipelines

Data Pipeline What is Data Pipeline

Stop Making THESE MISTAKES With BIG DATA

How to Build Secure Data Pipelines with LogDNA and McAfee

🚰 Data Reliability Engineering—Reliable Data Pipelines 101

Accelerate Your Data Engineering Pipeline Development on Databricks and Govern Your Delta Lakehouse

High Level Synthesis (HLS) Explanation 11: Introduction to Finding Pipelined Schedules

Automating Healthcare Data Pipelines with Sarwat Fatima

Embedding Security in a Modern DevOps Pipeline - A Customer's Perspective

[PyData 2021] Time: The most misunderstood dimension in data modelling

How To Use Smart JSON Substitution

Automating a Streaming Pipeline with OCR on Databricks Lakehouse

Cybersecurity Expert Answers Hacking History Questions | Tech Support | WIRED

Samantha Quiñones - Real-Time Data Pipelines (243)

Modern Config Driven ELT Framework for Building a Data Lake

The #1 AI Skill You're Missing: Context Engineering

The 10 tactics of fascism | Jason Stanley | Big Think

Danijel Grah, Milan Gabor - Vaccinating Android