Windowing Functions in Spark SQL Part 2 | First Value & Last Value Functions | Window Functions

preview_player
Показать описание
Windowing Functions in Spark SQL Part 2 | First Value & Last Value Functions | Window Functions
Hello and welcome back to Hadoop tutorials powered by Acadgild. This is the video in the series of Windowing functions in Spark. In this tutorial, we would understand the internals of First_value and Last_value functions in this lecture. Before that, if you have missed the first part of Windowing functions in spark SQL part 1 i.e. Lead and Lag functions, here is the link,

These are inbuilt functions which operate on a set of rows and return a single value for each row from the underlying query. This can be very useful to simplify complex queries and break them down into logical components.

To give you a brief idea of these windowing functions in spark, we will be using stock market data. You can download the sample stocks data from the link in the following,

Let's now understand the input(Dataset) and expected the output of First_value function.

1st column in this dataset is the date, 2nd column is the ticker which is nothing but the stock name, 3rd column is the open value of a stock, 4th column is the highest value of stock on a particular day, 5th column is the lowest value of stock on a particular day, 6th column is the closing value of stock, and the last column is the volume of the stocks on that particular day.

In the sample output on the right-hand side of the screen. We have performed the query to retrieve ticker, date, high and derived column First_heigh. which is the value of high when ticker appears 1st time in the dataset.

Kindly go through the execution part in the video, Please, subscribe and stay tuned for more such videos.
#sparksql, #windowing, #Hadoop, #bigdata

For more updates on courses and tips follow us on:
Рекомендации по теме
Комментарии
Автор

Why is last_value() giving different values ? Why cant it give one last_value by default similar to first value?

pramodajammi
Автор

why does the first_value shows same value for same partitions of data without the need of (Unbounded preceding and unbounded following)?

jaisingh-lbfp
Автор

What is the purpose of distinct in the queries. I did not see any difference in the output with or without distinct.

abhiganta