Data Pipelines: How to make them better

preview_player
Показать описание
There are so many tools that say ETL and data pipelines are easy. But what are the things you need to think about when designing a data pipeline. From auditing to logging, to data storage, data lakes, data warehouses. All these can be considered when designing your data pipeline.

⏯RELATED VIDEOS⏯

________________________________________________

________________________________________________

🎓Data courses (Not Produced by nullQueries)🎓

________________________________________________

📷VIDEO GEAR📷

💻VIDEO SOFTWARE💻
________________________________________________

Some of the links in this description are affiliate links and support the channel. Thanks for the support!
Рекомендации по теме
Комментарии
Автор

Steps to make pipeline better
1. Good auditing and logging: error handling
2. Repeatable and identical
3. Self healing: finding a way to find the delta, log files and compare, add a data lake before data warehouse, add hash or water marks before compare
4. Decouple EL and T: Landon Rae formate, transform to Dwh, make reporting table clean,
5. Always available: trancate and load refresh faster than update. Or build semantic layer
6. CICD: coded, git connected, versioned, rollbacks

georgechristy
Автор

Thanks for the video. Do you have an example of a pipeline built from scratch following the best practices mentioned in the video? Text/book or course-based doesn't matter

MrHaste
Автор

great video thanks for your effort but could you make more videos about building pipelines with open source tools that would greatly benefits people who just started in that field before jumping directly in the world of cloud

hoblwop