Master Databricks and Apache Spark Step by Step: Lesson 16 - Using SQL Window Functions

Показать описание

In this video, you learn how to use Spark Structured Query Language (SQL) window functions. Spark SQL is the most performant way to do data engineering on Databricks and window functions expand SQL functionality to include things like cumulative totals, ranking values, and including aggregations alongside detail rows. They can save you a lot of work. I'll explain the concepts and demonstrate them with code in a Databricks notebook.

Get my book Master Azure Databricks Step by Step at

Example Notebook for lesson 16 at:

You need to unzip the file and import the notebook into Databricks to run the code.

Video on Creating and Loading the tables used in this video

Рекомендации по теме

Комментарии

Thanks Bryan, You created spark in my journey towards Data bricks. Love from India.

amarnadhg

Best explanation of windows function. Thanks Bryan

vibhaskashyap

Bryan, Amazing lesson....window functions is a part of SQL that I haven't used much. So thanks for taking the time to go through this on the Databricks lessons! A big thank you for all your efforts in sharing your knowledge and experience and creating this series!

getsid

Hello Bryan: Thank you for the great SQL WINDOW FUNCTION in databricks as I am new to the analytics platform. I am hoping you'll provide a new video focused on customer surveys (i.e., healthcare customer satisfaction survey required after an encounter/visit). This would be a huge help.

Karen-

@Bryan Cafferky, love this topic! Learned a few key SQL concepts that I didn't know how to tackle before. For the given sales data, what would the query statement look like IF I wanted to show Customer, Year, and Total, Highest and Lowest Sales amounts for EACH Year? In other words, I do NOT want to show each transaction rows, but only the aggregates for each Year for each Customer. Thanks in advance.

anthonygonsalvis

Hi Bryan,

I need to write a query, I think it should be a correlated sub query.

My scenario is :

1. I have two tables, Table A and Table B
2. For every row of Table A I need to run a query on Table B, based on a criteria that is coming from Table A Column.
3. The Query needs to filter rows in table B with criteria present in table A columns for every row, and after filtering, it has to take the maximum amount in that result set and return the adjacent column's value from Table B.

For Ex:

Table A

Name | Gender
XYZ M
ABC F
DEF M

Table B

Name | Tenure | Salary | Rating
XYZ-1 5 5000 1
XYZ-2 8 5500 5
XYZ-3 4 1100 2
ABC-1 1 1200 3
ABC-2 7 1000 4
ABC-3 8 6000 1
DEF-1 5 8000 2
DEF-2 3 1500 1
DEF-3 1 1000 5

Query Result:

Name | Gender | Rating
XYZ M 5
ABC F 1
DEF M 2

1st Row Execution:

XYZ will be taken as a parameter to filter from Table B, hence the result set after filtering Table B will be:

1. All the names from Table B that begin with XYZ

Name | Tenure | Salary | Rating
XYZ-1 5 5000 1
XYZ-2 8 5500 5
XYZ-3 4 1100 2

2. Maximum Salary from the Step 1 needs to be filtered

Name | Tenure | Salary | Rating
XYZ-2 8 5500 5

Output:

Name | Gender | Rating
XYZ M 5

Kindly explain the query format or syntax to achieve this.

tauseefguard

I have question in regards to creating view v_productcatalog, in that view creation, I am seeing that you are joining 2 dim tables, isn't that you doing snowflake query ???, can you please clarify.

JD-xdxp

We are using azure synapse spark pools. Is this more like databricks or pure spark?

MathewBurford

little hard to understand....need to watch and practice again

Ron-gndv

Master Databricks and Apache Spark Step by Step: Lesson 16 - Using SQL Window Functions

Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction

Master Databricks and Apache Spark Step by Step: Series Overview

Master Databricks and Apache Spark Step by Step: Series Update - What's Changed?

Learn Apache Spark in 10 Minutes | Step by Step Guide

PySpark Tutorial

What is Databricks? | Introduction to Databricks | Edureka

Master Databricks and Apache Spark Step by Step: Lesson 2 - Create a Databricks Workspace

Intro To Databricks - What Is Databricks

Master Databricks and Apache Spark Step by Step: Using Scala Dataframes & Datasets

What is Data Bricks ? | Data Bricks Explained in 5 mins | Apache Spark | Great Learning

What Is Apache Spark?

Databricks and Apache Spark

Master Databricks and Apache Spark Step by Step: Lesson 3 - Databricks Demo

Master Databricks and Apache Spark Step by Step: Lesson 12 - Using SQL Views

Master Databricks and Apache Spark Step by Step: Lesson 13 - Using SQL Joins

Master Databricks and Apache Spark Step by Step: Lesson 9 - Creating the SQL Tables on Databricks

Master Databricks and Apache Spark Step by Step: Lesson 7 - Spark SQL Data Definition Language.

Prerequisites To Learn Spark!

Master Databricks and Apache Spark Step by Step: Lesson 14 - Using SQL Set Operators

Spark Full Course | Spark Tutorial For Beginners | Learn Apache Spark | Simplilearn

Master Databricks & Apache Spark Step by Step: Lesson 5 - Using The Data Science Process

Master Databricks & Apache Spark Step by Step: Lesson 4 - Create a Spark Cluster

Master Databricks and Apache Spark Step by Step: Lesson 35 - How to use SparkR (R on Spark)

Master Databricks and Apache Spark Step by Step: Lesson 18 - Using SQL Views on Spark