18 Spark SQL - Windowing Functions

preview_player
Показать описание
Connect with me or follow me at
Рекомендации по теме
Комментарии
Автор

by default window for order by is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW this is the reason first_value() is able to see rows from the beginning, whereas, last_value is only able to see rows upto CURRENT ROW.
by using "range between unbounded preceding and unbounded following" withing last_value() will give correct result.
for example:
select o.*,
row_number() over( order by o.amount) rnm,
rank() over(order by o.amount) rnk,
first_value(o.amount) over(partition by o.oid order by o.amount desc range between unbounded preceding and unbounded following) fv,
last_value(o.amount) over(partition by o.oid order by o.amount desc range between unbounded preceding and unbounded following) lv
from ord_data o

avneetsingh
Автор

over ( partition by o.order_id order by oi.order_item_subtotal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as last_value_order_item_subtotal,

if you use above RANGE Clause, it comes correctly.... reference : hive jira

kirantadisetti
Автор

Also, thank you very much for creating this playtlist. I tried reading some Hadoop books, but it didn't help me much. This playlist is very useful and expedited my learning.

manikandanherein
Автор

last_value :you are using order by oi.order_item_subtotal desc, it should be oi.order_item_subtotal to sort in ascending order

arunjyotisarmah
Автор

It works when we don't use order by for last_value. It didn't work when we use order by in both asc and desc

manikandanherein