filmov
tv
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL Pipeline
Показать описание
At Mars Petcare (in a division known as Kinship Data & Analytics) we are building out the Petcare Data Platform – a cloud based Data Lake solution. Leveraging Microsoft Azure, we were faced with important decisions around tools and design. We chose Delta Lake as a storage layer to build out our platform and bring insight to the science community across Mars Petcare. Migrating away from Azure Data Factory completely, we leveraged Spark and Databricks to build ‘Kyte’, a bespoke pipeline tool which has massively accelerated our ability to ingest, cleanse and process new data sources from across our large and complicated organisation. Building on this we have started to use Delta Lake for our ETL configurations and have built a bespoke UI for monitoring and scheduling our Spark pipelines. Find out more about why we chose a Spark-heavy ETL design and a Delta Lake driven platform, the advantages (and difficulties) of migrating away from Azure Data Factory, and why we are committing to Spark and Delta Lake as the core of our Platform to support our mission: Making a Better World for Pets! Key Takeaways:
-Leveraging Delta Lake as Engineers for exposing data to Data Scientists
-Advantages of a Databricks & Spark ETL Solution over Azure Data Factory
-Using Delta Lake for ETL Config
Connect with us:
-Leveraging Delta Lake as Engineers for exposing data to Data Scientists
-Advantages of a Databricks & Spark ETL Solution over Azure Data Factory
-Using Delta Lake for ETL Config
Connect with us:
Комментарии