top interview questions and answers in pyspark | drop duplicates from dataframe | #interview

preview_player
Показать описание
top interview question and answer in pyspark | drop duplicates from dataframe | #interview

Create dataframe :
===============

# schema of the above table
schema = StructType([
StructField("emp_id", IntegerType(), True),
StructField("emp_name", StringType(), True),
StructField("emp_gender", StringType(), True),
StructField("emp_age", IntegerType(), True),
StructField("emp_salary", IntegerType(), True),
StructField("emp_manager", StringType(), True)
])

# creating the dataframe !
data = [
(1, "Arjun Patel", "Male", 30, 60000, "Aarav Sharma"),
(2, "Aarav Sharma", "Male", 28, 55000, "Zara Singh"),
(3, "Zara Singh", "Female", 35, 70000, "Arjun Patel"),
(4, "Priya Reddy", "Female", 32, 65000, "Aarav Sharma"),
(1, "Arjun Patel", "Male", 30, 60000, "Aarav Sharma"),
(6, "Naina Verma", "Female", 31, 72000, "Arjun Patel"),
(1, "Arjun Patel", "Male", 30, 60000, "Aarav Sharma"),
(4, "Priya Reddy", "Female", 32, 65000, "Aarav Sharma"),
(5, "Aditya Kapoor", "Male", 28, 58000, "Zara Singh"),
(10, "Anaya Joshi", "Female", 27, 59000, "Aarav Sharma"),
(11, "Rohan Malhotra", "Male", 36, 73000, "Zara Singh"),
(3, "Zara Singh", "Female", 35, 70000, "Arjun Patel")
]

top interview question and answer in pyspark :

#pwc #fang #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion
Рекомендации по теме
Комментарии
Автор

distinct and row_number() window function

VikasChavan-vc
join shbcf.ru