tiger analytics interview questions and answers in pyspark | #interview

Показать описание

tiger analytics interview questions and answers in pyspark | #interview | #dataengineers | #datascience | #dataanalytics

"tiger analytics interview questions and answers in pyspark "

Create DataFrame Code :
======================

flights_data = [(1,'Flight1' , 'Delhi' , 'Hyderabad'),
(1,'Flight2' , 'Hyderabad' , 'Kochi'),
(1,'Flight3' , 'Kochi' , 'Mangalore'),
(2,'Flight1' , 'Mumbai' , 'Ayodhya'),
(2,'Flight2' , 'Ayodhya' , 'Gorakhpur')
]

_schema = "cust_id int, flight_id string , origin string , destination string"

top interview question and answer in pyspark :

#fang #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion

DEwithDhairy

Рекомендации по теме

Комментарии

SQL Version : Ankit Bansal ( Have done little Twist )

DEwithDhairy

ans=df_flight.groupBy("cust_id").agg(first("origin"), last("destination"))
display(ans)
directly we can use this

siddharthchoudhary

Well this question is good one. i solved it before using recursive cte in SQL

jhonsen

Thank you so much for sharing the knowledge, like and subscribed

mahendranath

ms sql server query
with cte as(
select *,
ROW_NUMBER() over(partition by id order by fly) as r
from flight
)
select distinct id,
first_value(origin) over(partition by id order by r ) as start,
LAST_VALUE(dest) over(partition by id order by r
range between unbounded preceding and unbounded following) as endp
from cte

yptjces

Great i am just curious to know how much time usually they give to solve it. More you pracitce faster you will solve but depends how good you are. Secondly what is the use of aggregate and max in the final result if i am grouping by cust_id and comparing when start == rn = 1 as origin and end == rn == 3 as destination in both origin and dest col.

jhonsen

Below is my query, which works for any number of flights:

with cte1 as (
select *, row_number() over(partition by cid order by fid asc) as rn from flights),

cte2 as(
select cid, min(rn) as mi, max(rn) as mx from cte1 GROUP by cid)

select ori.cid, ori.origin, des.destination from
(
select c1.* from cte1 c1
inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mi) ori
join (
select c1.* from cte1 c1
inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mx) des
on ori.cid=des.cid;

saheedahmed

I'm little new to these type of questions, I have used another method to solve the same, can you tell me why this approach is not suited ?

Select a.cust_id, a.origin, b.destination from (Select o.cust_id, o.origin from travelling_details o
where origin not In ( Select t.destination from travelling_details t
where t.cust_id = o.cust_id)) a join
(Select o.cust_id, o.destination from travelling_details o
where o.destination not In ( Select t.origin from travelling_details t
where t.cust_id = o.cust_id)) b
on a.cust_id = b.cust_id;

hariprasad

one doubt: how we ensure flight id describe the order of connectivity

ashish.barmer

Your solution will not work if flight10, flight11...flight15 will be there

AbhishekKumar_

WITH RankedFlights AS (
SELECT
customer_id,
origin,
destination,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight) AS row_num,
MAX(ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight)) AS max_row_num
FROM
flights_data
)
SELECT
customer_id,
MAX(CASE WHEN row_num = 1 THEN origin END) AS origin,
MAX(CASE WHEN row_num = max_row_num THEN destination END) AS destination
FROM
RankedFlights
GROUP BY
customer_id;

Maheshwaripremierleague

tiger analytics interview questions and answers in pyspark | #interview

tiger analytics interview questions and answers in pyspark | #interview

Tiger Analytics Senior Data Engineer Interview Questions | SDE Interview Experience | 25 LPA

tiger analytics interview questions

Tiger Analytics PySpark Interview Question | Very Important Question of PySpark |

Python Interview Question asked in Tiger Analytics

Tiger Analytics SQL Interview Question for Data Engineering Position

Tiger Analytics Set of 2 SQL Interview Problems | SQL For Beginners

How to Prepare for Tiger Analytics Interview | Online Test | Recruitment Process

48.Tiger Analytics Data Analyst Interview Questions| SQL Questions for Data Analyst by @PandeyGuruji

Tiger analytics Interview experience | Hiring process | How to prepare for Interviews

Sr. Analyst at Tiger Analytics with a 300% Hike | Sr. Analyst Interview | Applied Ai Course Reviews

Tiger Analytics Python Interview Question

Data Analyst Interview Experience with Tiger Analytics • 99% Success

Tiger Analytics Interview Question | Find out all the DataFrames in Spark Session |

Roadmap To TIGER ANALYTICS As DATA SCIENTIST ? | Expected Interview QA For Data Scientist 🔥🔥🔥...

Data Scientist Interview Problem | Python Problem Asked in TIGER ANALYTICS 🔥🔥🔥 !!!

Rahul Data Scientist at Tiger Analytics | Data Scientist Interview | Applied Ai Course Reviews

Questions asked in Tiger Analytics TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

Top 5 Data Analyst Interview Mistakes

Most Asked Coding Interview Question (Don't Skip !!😮) #shorts

SQL Interview Question Asked in Tiger Analytics | Data Analyst SQL Interview Problem

3 most common data modeling interview questions

Tiger Analytics PySPark Interview Question - Split and Explode Functions in PySpark

Balaji Senior Analyst in Data Scientist at Tiger Analytics | Applied AI Course Reviews