day 8 | capgemini interview question | pyspark scenario based interview questions and answers

preview_player
Показать описание
pyspark scenario based interview questions and answers
capgemini interview question and answers

Create DataFrame :
================

lift_data = [
(1,300),
(2,350)
]

lift_schema = "id int , capacity_kg int"

lift_passengers_data = [
('Rahul',85,1),
('Adarsh',73,1),
('Riti',95,1),
('Viraj',80,1),
('Vimal',83,2),
('Neha',77,2),
('Priti',73,2),
('Himanshi',85,2)
]

lift_passengers_schema = "passenger_name string , weight_kg int, lift_id int"

#pyspark #capgemini #capgeminioffcampus
Рекомендации по теме
Комментарии
Автор

str = 'HelloabcdefHelloxyz'

pattern = 'Hello'
ouput : pattern found at 0, 11


how to solve in python, i dont understand the logic

ParmeshwarSalunke-lozy
Автор

Hi, it's a good question. I have solved it, please have a look and let me know.


tr_df=lift_passengers_df.withColumn('Running_Sum',
joined_df=tr_df.join(lift_df, tr_df.lift_id == lift_df.id, < lift_df.capacity_kg)

sarathkumar-tris
Автор

Bro can u solve this
File 1- DF1



|id |segments |







File 2 - DF2



|id |segments |







finalDF



|id |segments |








need to remove duplicate and join them to get final dataframe where id is the same

mojijojo-rd