JLL Pyspark Interview Question - Get Top3 pickup locations

preview_player
Показать описание
One of the Pyspark Interview question recently asked in JLL interview.
We need to Get Top3 pickup locations.

Lets see how we can achieve this by using GroupBy count and limit.

Mentioning the dataframe details here

# Define the schema
schema = StructType([
StructField("reqid", IntegerType(), True),
StructField("pickup_location", StringType(), True)
])

# Create a DataFrame with the defined schema
data = [(48, "Airport"), (49, "Office"),(50, "Hospital"),(51, "Airport"),(52, "Hospital"),(53, "Shoppingmall"),(54, "Office"),(55, "Hospital"),(56, "Hospital")]

For more Azure Data Bricks interview questions. Check out our playlist.

Contact us:

Follow us on
Рекомендации по теме
Комментарии
Автор

with cte_pickup as
(
Select pickup_location, count('*') as location_count from pickup_tbl group by pickup_location
)
Select pickup_location as location from cte_pickup where location_count>=2 order by location_count desc
solution using sql

prajju
Автор

Jangam, Sravan Kumar
9:56 AM (20 minutes ago)
to me

we have column has id

a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21



the output should be using sql query

a1

a2, a3

a4, a5, a6

a7, a8, a9, a10

a11, a12, a13, a14, a15

a16, a17, a18, a19, a20, a21 can u pls explin this scenario

ShravanKumarJangam