Let’s Dumb-Proof Data Pipelines

preview_player
Показать описание
Developing and deploying data pipelines in production is easy. Maintaining data pipelines is hard because most often it’s not the same engineer or team responsible for operating and maintaining data pipelines in production. If your data pipelines are not parameterized and configurable, you need to recompile your source code and go through your release process even for simple configuration changes. Making your data pipelines configurable is not enough. Bad user input can result in many classes of issues such as data loss, data corruption. data correctness, etc.

In this talk, you’ll walk away with techniques to make your data pipelines dumb-proof.
1. Why do you need to make your data pipelines configurable?
2. How to seamlessly promote your data pipelines from one environment to another without making any source code changes?
3. How to reconfigure your data pipelines in production without recompiling the ETL source code?
4. What are the Pros and Cons of using Databricks Notebook widgets for configuring your data pipelines
5. How to externalize configurations from your ETL source code and how to read and parse configuration files
6. Finally, you’ll learn how to take it to next level by leveraging Scala language features, pure config, and typesafe config libraries to achieve boilerplate free configuration code and configuration validations

Connect with us:
Рекомендации по теме
Комментарии
Автор

where we can create/keep this config file.in databricks project ?
so when we push project to Devops, the config file will move too?
or manually I have to keep config file in DevOps REPO everytime and changed that config path in my code?

TheVijaynegi
join shbcf.ru