Pass Summit 2019: Practical Azure Databricks

Показать описание

Practical Azure Databricks: Engineering & Warehousing at Scale.
Pass Summit 2019. November 5th Seattle

You've probably heard the buzz - there's a thing called Azure Databricks that is going to solve all your problems? Surprisingly, it just might. First and foremost, knowing how to use Apache Spark will earn you more money. It is that simple. Apache Spark has been the biggest thing in Big Data processing for many years now, but it has always felt inaccessible to the humble Microsoft data developer. Azure Databricks takes the Spark engine and makes it really, really easy. Data Engineers who know Apache Spark are in massive demand and we're going to introduce you to the skills required to succeed.

In the morning we will introduce Azure Databricks then discuss how to develop in-memory elastic scale data engineering pipelines. We will talk about shaping and cleaning data, the languages, notebooks, ways of working, design patterns and how to get the best performance. You will build an engineering pipeline with Python, then with Scala via Azure Data Factory, then we'll get it into context in a full solution. We will also talk about Data Lakes - how to structure and manage them over time in order to maintain an effective data platform.

We will then shift gears, taking the data we prepared earlier and enriching it with additional data sources before modelling it in a relational warehouse. We will take a look at various patterns of performing data engineering to cater for scenarios such as real-time streaming, de-centralised reporting, rapidly evolving data science labs and huge data warehouses in specialised storage such as Azure SQL Data Warehouse. By the end of the day, you will understand how Azure Databricks sits at the core of data engineering workloads and is a key component in Modern Azure Warehousing.

#PASSSummit, #Databricks