PySpark Window, ranking (rank, dense_rank, row_number etc.) and aggregation(sum,min,max) function

preview_player
Показать описание
In this video, I have illustrated the spark window analytical function like rank, dense rank, row number, and aggregation functions like sum, min, max, etc.
with example. employee dataset is present in the following GitHub.

employee schema which I used to read the data set is as below.

employee_schema=StructType(((StructField('EMPLOYEE_ID',IntegerType(),True)), \
(StructField('FIRST_NAME',StringType(),True)), \
(StructField('LAST_NAME',StringType(),True)), \
(StructField('EMAIL',StringType(),True)), \
(StructField('PHONE_NUMBER',StringType(),True)), \
(StructField('HIRE_DATE',StringType(),True)), \
(StructField('JOB_ID',StringType(),True)), \
(StructField('SALARY',FloatType(),True)), \
(StructField('COMMISSION_PCT',IntegerType(),True)), \
(StructField('MANAGER_ID',IntegerType(),True)), \
(StructField('DEPARTMENT_ID',IntegerType(),True))))
Рекомендации по теме
Комментарии
Автор

Thanks for pointing out the difference between when using/not using the order by in combination with window function. Definitely unintuitive but useful as well!

the_iurlix