Reflecting On The Past 6 Years Of Data Engineering

preview_player
Показать описание
Summary

This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.

Announcements



• Hello and welcome to the Data Engineering Podcast, the show about modern data management


• Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years



Interview



• Introduction


• 6 years of running the Data Engineering Podcast


• Around the first time that data engineering was discussed as a role




• Followed on from hype about "data science"




• Hadoop era


• Streaming


• Lambda and Kappa architectures




• Not really referenced anymore




• "Big Data" era of capture everything has shifted to focusing on data that presents value




• Regulatory environment increases risk, better tools introduce more capability to understand what data is useful




• Data catalogs




• Amundsen and Alation




• Orchestration engine




• Oozie, etc. -> Airflow and Luigi -> Dagster, Prefect, Lyft, etc.


• Orchestration is now a part of most vertical tools




• Cloud data warehouses


• Data lakes


• DataOps and MLOps


• Data quality to data observability


• Metadata for everything




• Data catalog -> data discovery -> active metadata




• Business intelligence




• Read only reports to metric/semantic layers


• Embedded analytics and data APIs




• Rise of ELT




• dbt


• Corresponding introduction of reverse ETL




• What are the most interesting, unexpected, or challenging lessons that you have learned while working on running the podcast?


• What do you have planned for the future of the podcast?



Parting Question



• From your perspective, what is the biggest gap in the tooling or technology for data management today?



Closing Announcements












Sponsored By:


Looking for the simplest way to get the freshest data possible to your teams? Because let's face it: if real-time were easy, everyone would be using it. Look no further than Materialize, the streaming database you already know how to use.

Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Delivered as a single platform with the separation of storage and compute, strict-serializability, active replication, horizontal scalability and workload isolation — Materialize is now the fastest way to...
Рекомендации по теме