filmov
tv
rows between in spark | range between in spark | window function in pyspark | Lec-17
Показать описание
In this video I have talked about window function in pyspark.Also I have talked about rows between, range between, unbounded preceding, unbounded following and current rows. If you want to optimize your process in Spark then you should have a solid understanding of this concept.
Q1. Data:-
product_data = [
(2,"samsung","01-01-1995",11000),
(1,"iphone","01-02-2023",1300000),
(2,"samsung","01-02-2023",1120000),
(3,"oneplus","01-02-2023",1120000),
(1,"iphone","01-03-2023",1600000),
(2,"samsung","01-03-2023",1080000),
(3,"oneplus","01-03-2023",1160000),
(1,"iphone","01-01-2006",15000),
(1,"iphone","01-04-2023",1700000),
(2,"samsung","01-04-2023",1800000),
(3,"oneplus","01-04-2023",1170000),
(1,"iphone","01-05-2023",1200000),
(2,"samsung","01-05-2023",980000),
(3,"oneplus","01-05-2023",1175000),
(1,"iphone","01-06-2023",1100000),
(3,"oneplus","01-01-2010",23000),
(2,"samsung","01-06-2023",1100000),
(3,"oneplus","01-06-2023",1200000)
]
product_schema=["product_id","product_name","sales_date","sales"]
Q2.Data:-
emp_data = [(1,"manish","11-07-2023","10:20"),
(1,"manish","11-07-2023","11:20"),
(2,"rajesh","11-07-2023","11:20"),
(1,"manish","11-07-2023","11:50"),
(2,"rajesh","11-07-2023","13:20"),
(1,"manish","11-07-2023","19:20"),
(2,"rajesh","11-07-2023","17:20"),
(1,"manish","12-07-2023","10:32"),
(1,"manish","12-07-2023","12:20"),
(3,"vikash","12-07-2023","09:12"),
(1,"manish","12-07-2023","16:23"),
(3,"vikash","12-07-2023","18:08")]
emp_schema = ["id", "name", "date", "time"]
Q3.Data:-
product_data = [
(1,"iphone","01-01-2023",1500000),
(2,"samsung","01-01-2023",1100000),
(3,"oneplus","01-01-2023",1100000),
(1,"iphone","01-02-2023",1300000),
(2,"samsung","01-02-2023",1120000),
(3,"oneplus","01-02-2023",1120000),
(1,"iphone","01-03-2023",1600000),
(2,"samsung","01-03-2023",1080000),
(3,"oneplus","01-03-2023",1160000),
(1,"iphone","01-04-2023",1700000),
(2,"samsung","01-04-2023",1800000),
(3,"oneplus","01-04-2023",1170000),
(1,"iphone","01-05-2023",1200000),
(2,"samsung","01-05-2023",980000),
(3,"oneplus","01-05-2023",1175000),
(1,"iphone","01-06-2023",1100000),
(2,"samsung","01-06-2023",1100000),
(3,"oneplus","01-06-2023",1200000)
]
product_schema=["product_id","product_name","sales_date","sales"]
For more queries reach out to me on my below social media handle.
My Gear:-
My PC Components:-
Q1. Data:-
product_data = [
(2,"samsung","01-01-1995",11000),
(1,"iphone","01-02-2023",1300000),
(2,"samsung","01-02-2023",1120000),
(3,"oneplus","01-02-2023",1120000),
(1,"iphone","01-03-2023",1600000),
(2,"samsung","01-03-2023",1080000),
(3,"oneplus","01-03-2023",1160000),
(1,"iphone","01-01-2006",15000),
(1,"iphone","01-04-2023",1700000),
(2,"samsung","01-04-2023",1800000),
(3,"oneplus","01-04-2023",1170000),
(1,"iphone","01-05-2023",1200000),
(2,"samsung","01-05-2023",980000),
(3,"oneplus","01-05-2023",1175000),
(1,"iphone","01-06-2023",1100000),
(3,"oneplus","01-01-2010",23000),
(2,"samsung","01-06-2023",1100000),
(3,"oneplus","01-06-2023",1200000)
]
product_schema=["product_id","product_name","sales_date","sales"]
Q2.Data:-
emp_data = [(1,"manish","11-07-2023","10:20"),
(1,"manish","11-07-2023","11:20"),
(2,"rajesh","11-07-2023","11:20"),
(1,"manish","11-07-2023","11:50"),
(2,"rajesh","11-07-2023","13:20"),
(1,"manish","11-07-2023","19:20"),
(2,"rajesh","11-07-2023","17:20"),
(1,"manish","12-07-2023","10:32"),
(1,"manish","12-07-2023","12:20"),
(3,"vikash","12-07-2023","09:12"),
(1,"manish","12-07-2023","16:23"),
(3,"vikash","12-07-2023","18:08")]
emp_schema = ["id", "name", "date", "time"]
Q3.Data:-
product_data = [
(1,"iphone","01-01-2023",1500000),
(2,"samsung","01-01-2023",1100000),
(3,"oneplus","01-01-2023",1100000),
(1,"iphone","01-02-2023",1300000),
(2,"samsung","01-02-2023",1120000),
(3,"oneplus","01-02-2023",1120000),
(1,"iphone","01-03-2023",1600000),
(2,"samsung","01-03-2023",1080000),
(3,"oneplus","01-03-2023",1160000),
(1,"iphone","01-04-2023",1700000),
(2,"samsung","01-04-2023",1800000),
(3,"oneplus","01-04-2023",1170000),
(1,"iphone","01-05-2023",1200000),
(2,"samsung","01-05-2023",980000),
(3,"oneplus","01-05-2023",1175000),
(1,"iphone","01-06-2023",1100000),
(2,"samsung","01-06-2023",1100000),
(3,"oneplus","01-06-2023",1200000)
]
product_schema=["product_id","product_name","sales_date","sales"]
For more queries reach out to me on my below social media handle.
My Gear:-
My PC Components:-
Комментарии