Tiger Analytics PySpark Interview Question | Very Important Question of PySpark |

preview_player
Показать описание
data=[
('Rudra','math',79),
('Rudra','eng',60),
('Shivu','math', 68),
('Shivu','eng', 59),
('Anu','math', 65),
('Anu','eng',80)
]
schema="Name string,Sub string,Marks int"

Solution:

I have prepared many courses on Azure Data Engineering

1. Build Azure End to. End Project

2. Build Delta Lake project

3. Master in Azure Data Factory with ETL Project and PowerBi

4. Master in Python

Check out my courses on Azure Data Engineering

hastags
tags

#dataengineer #interviewquestions #pysparkinterview
#hashtags #hastag #tags #tcs
Рекомендации по теме
Комментарии
Автор

Use pivot funtion with Subject column to get a new column for each value in that particular column. Can use aggregate function sum on Marks. Order of Eng/Math column may not be same.

abhigyapranshu
Автор

All videos in this pyspark interview playlist are highly useful Sagar. Big Thanks for your efforts man!!

sheikirfan
Автор

pivoted_df = "first"}).show()

abhishekpathak
Автор

Hi Sagar, to master pyspark which of your's course should i buy?

pratik
Автор

I tried below df =
df.show(), but throw error as jgd = self._jgd.pivot(pivot_col) Column is not iterable

surenderraja
Автор

was this asked in Tiger analytics (Canada)?

Pratik_Tortikar
Автор

My Solution :

df.withColumn("math", when(df.Subject=="math", df.Marks).otherwise(0))\
.withColumn("eng", when(df.Subject=="eng", df.Marks).otherwise(0))\
.groupBy("Name").agg(max("math").alias("math"), max("eng").alias("eng")).show()

throughmyglasses
Автор

Hi Sir

My Way:

df1 =
df2 = df1.select("Name", "math", "eng").orderBy(col('math').desc(), col('eng').desc())
df2.show()

rawat
Автор

Sagar, I had a query.... For using collect_list command, we have to sort the dataset based on subject first, right?

My Solution:


df_1 = spark.createDataFrame(data=data, schema=["Name", "Sub", "Marks"])

df_2 = df_1.groupBy(col("Name")).pivot("Sub", ["math", "eng"]).agg(sum("Marks"))
or,


display(spark.sql("Select Name, SUM(CASE WHEN sub like 'math' THEN Marks ELSE 0 END) as Math, SUM(CASE WHEN sub like 'eng' THEN Marks ELSE 0 END) as Eng from Pivot_Data GROUP BY Name"))

_Sujoy_Das
Автор

please english language azure datbricks
required plese

dorwxtk
Автор

: "last"})
df1.show()
This code will give you irrspective how many subject you have in Sub col umn as different columns

venkatsubbaiah
Автор

df.groupBy("Name").agg(max(when(df.Sub=='math', df.Marks).otherwise(0)).alias("Math"), max(when(df.Sub=='eng', df.Marks).otherwise(0)).alias("eng"))

okouroy
Автор

df.groupby(col("Name")).agg(
sum(when(col("Sub")=="math", col("Marks")).otherwise(0)).alias("maths"),
sum(when(col("Sub")=="eng", col("Marks")).otherwise(0)).alias("eng")
).show()

amanmaheshwari
Автор

df.groupBy('name').pivot('Sub', ['math', 'eng']).sum('Marks').display()

ayushmangal
Автор

df_sub1 =
df_sub1.withColumn('math', df_sub1.Sub_Marks[0]).withColumn('eng', df_sub1.Sub_Marks[1]).select('Name', 'math', 'eng').show()

kunalshinkar
Автор

df.groupBy(f.col("Name")).pivot("Sub", [i[0] for i in

balaa