Master Databricks and Apache Spark Step by Step: Lesson 13 - Using SQL Joins

Показать описание

In this video, you learn how to query perform joins using Spark Structured Query Language (SQL). Spark SQL is the most performant way to do data engineering on Databrick and Spark. I'll explain the concepts and demonstrate them with code in a Databricks notebook.

Get Master Azure Databricks Step by Step at

Example Notebook for lesson 13 at:

You need to unzip the file and import the notebook into Databricks to run the code.

Video on Creating and Loading the tables used in this video

Video on Dimensional Modeling - with an explanation of Snowflake Schema
.

Рекомендации по теме

Комментарии

Hello there I found your presentation interesting. If that can help, I can provide another use case where cross join is usefull.
For instance, you have a table of cars with their attributes and you want to compare them and give a comparison score.

Scoring could be as following
- Checking if both car have the same transmission type (manual or automatic)
- Checking if both cars have the same fuel type
- Comparing the HP (closet gives the higher score)
- And so on...
- For each attributes we can give a score which will give a total at the end between each cars

In order to do it, you can create the cartesian product of the cars
Then we filter the list where car_id_x <> car_id_y (so we dont compare the cars with themselves)
We then obtain the score for each permutation as mentionned
Then we can order by score DESC

At the end we obtain an ordered list by score DESC of all the other cars for each cars!

oliviersac

This video helped me get notice in a new program to me, Databricks, in week 2 of a new job. You explained these joins very well. Great Video!!

moe

I'm learning SQL thanks to you. This video clarified my ideas on how to use SQL joins. Amazing content. It would have been nice to have the same lesson using Python.

gianniprocida

Thank you very much for the series! They were very helpful !!

josegheevarghese

Thanks a lot for this great series on Spark

vinr

Thanks you for your videos. They have been so helpful.

rhard

HI Bryan, you mentioned that ideal practise is thst system of record should come from warehouse .. directly pulling from application production databases is not a best practise.
My question is that even to populate warehouse you need to pull data from application databases right ?? which can never be avoided .. Am i missing anything here ?

potnuruavinash

Can you please explain your point at 7:04 "I prefer not to use outer joins...." ? I think you said you prefer to use outer joins in the beginning of this video, to identify data missing..

Raaj_ML

Can I ask a question related to spark streaming, say we have incoming csv files and we need to process them, however we need to do all the transformation on the data within a single file and output it as a file, which means each incoming file should have and corresponding out going file with the necessary transformation done to the records in that file only. However we also need to work on a cluster so that load can be distributed and files can be processed in parallel. Is this something possible? Thanks

vinr

Hi Bryan,
Could you please create a video how to append the data to existing table when we receive new file or additional data.

Thanks,
Sri

kumarpyarasani

Master Databricks and Apache Spark Step by Step: Lesson 13 - Using SQL Joins

Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction

Master Databricks and Apache Spark Step by Step: Series Overview

Learn Apache Spark in 10 Minutes | Step by Step Guide

Apache Spark Architecture - EXPLAINED!

Master Databricks and Apache Spark Step by Step: Series Update - What's Changed?

Master Databricks and Apache Spark Step by Step: Lesson 3 - Databricks Demo

Master Databricks & Apache Spark Step by Step: Lesson 4 - Create a Spark Cluster

What Is Apache Spark?

Master Spark Programming and Azure Databricks

Master Databricks and Apache Spark Step by Step: Lesson 2 - Create a Databricks Workspace

Master Databricks and Apache Spark Step by Step: Using Scala Dataframes & Datasets

Fundamentals of Apache Spark & Azure Databricks

Master Databricks and Apache Spark Step by Step: Lesson 40 - Features, Trends, and Direction

Understanding Apache Spark Architecture | Common Big Data Interview Questions #interview

Master Databricks and Apache Spark Step by Step: Lesson 23 - Using PySpark Dataframe Methods

Best Apache Spark Course with Databricks for Data Engineering | 2 End-To-End Projects

Intro To Databricks - What Is Databricks

Master Databricks and Apache Spark Step by Step: Lesson 26 - PySpark: Intro to the New pandas UDFs

Databricks Tutorial (From Zero to Hero) | Azure Databricks Masterclass

Databricks and Apache Spark

PySpark Tutorial | Full Course (From Zero to Pro!)

What is Data Bricks ? | Data Bricks Explained in 5 mins | Apache Spark | Great Learning

Spark Internals and Architecture in Azure Databricks

Understanding Databricks & Apache Spark Performance Tuning: Lesson 01 - Spark Architecture