Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Показать описание

Input
data = [
(1, "Sagar", 23, "Male", 68.0),
(2, "Kim", 35, "Female", 90.2),
(3, "Alex", 40, "Male", 79.1),
]
schema = "Id int,Name string,Age int,Gender string,Marks float"

Solution:
for i in set_of_dtypes:
cols=[]
if(i==j[1]):

I have prepared many courses on Azure Data Engineering

1. Build Azure End to. End Project

2. Build Delta Lake project

3. Master in Azure Data Factory with ETL Project and PowerBi

4. Master in Python

Check out my courses on Azure Data Engineering

hastags
tags

#dataengineer #interviewquestions #spark
#hashtags #hastag #tags

Рекомендации по теме

Комментарии

very good you are posting real interview questions many of them simply explain concer defentitiins

kunuturuaravindreddy

# creating a dict of columns as to avoid checking multiple datatypes
d={}
for col in df.dtypes:
if col[1] not in d:
d[col[1]] = [col[0]]

for key, val in d.items():
df.select(val).show()
# write df to the location

sourav_sarkar_

int_cols = [col for col, dtype in df.dtypes if dtype == 'int']
string_cols = [col for col, dtype in df.dtypes if dtype == 'string']
float_cols = [col for col, dtype in df.dtypes if dtype == 'float']

Creating DataFrames for each data type
int_df = df.select(int_cols)
string_df = df.select(string_cols)
float_df = df.select(float_cols)

Offical_PicturePerfect

# creating a dict of columns to avoid checking multiple datatypes
d={}
for col in df.dtypes:
if col[1] not in d:
d[col[1]] = [col[0]]

print(d)

for key, val in d.items():
df.select(val).show()
# write df to the location
#

sourav_sarkar_

Thank you for posting this video. But, can you please post pyspark interview questions for freshers. Thank you!

aamirmansuri

Good problem to solve. Thanks for posting sagar!

myl

My Way Sir

intType = []
stringType = []
floatType = []
for i in df.dtypes:
if i[1] == 'int':
intType.append(i[0])
elif i[1] == 'string':
stringType.append(i[0])
elif i[1] == 'float':
floatType.append(i[0])

dfInt = df.select(*intType)
dfString = df.select(*stringType)
dfFloat = df.select(*floatType)

rawat

Shouldn’t you use append instead of overwrite

Nextgentrick

Hi Sagar
this Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame
what was the experience the candidate has ?

pratyushkumar

My solution is as follows:

string = df
integer = df
float = df

for i in df.dtypes:
if i[1]!='string' and i[1]=='int':
string = string.drop(i[0])
float = float.drop(i[0])
elif i[1]!='string' and i[1]=='float':
string = string.drop(i[0])
integer = integer.drop(i[0])
elif i[1]!='int' and i[1]=='string':
integer = integer.drop(i[0])
float = float.drop(i[0])
elif i[1]!='int' and i[1]=='float':
integer = integer.drop(i[0])
string = string.drop(i[0])
elif i[1]!='float' and i[1]=='string':
float = float.drop(i[0])
integer = integer.drop(i[0])
else:
float = float.drop(i[0])
string = string.drop(i[0])

print(string)
print(integer)
print(float)

_Sujoy_Das

okay, is this internal functionality of conversion to parq format

bhumikalalchandani

my solution:
dict={}
for i in df.dtypes:
if i[1] in dict.keys():
l=dict.get(i[1])
l.append(i[0])
dict.update({i[1]:l})
else:
l=[]
l.append(i[0])
dict.update({i[1]:l})

for i in dict.keys():
df_s=df.select(dict.get(i))
df_s.show()

##did show instead of writing

souvikchattopadhyay

Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Capgemini DE interview Questions for 3-4 years of exp. #capgemini #dataengineer #interview

Data Engineer Interview Experience with Capgemini

Capgemini Interview Experience | Python/Web Developer - GIT/AWS Interview Question Answers | SDE

day 8 | capgemini interview question | pyspark scenario based interview questions and answers

All Capgemini Interview Questions | Analyst| Capgemini Interview Experience| How to Crack Interview?

REAL SQL Interview PROBLEM by Capgemini | Solving SQL Queries

PySpark Interview Questions & Answers | PySpark Interview Questions

Capgemini Latest Interview Experience | Interview Questions & Answers 🔥🔥

3 most common data modeling interview questions

Interview Gone Wrong - part 1 !!! Data Engineer L1 round

Capgemini SQL Interview Question - Find the percentage variance of sales from previous day

Capgemini Preparation 2024 | Capgemini Interview Experience | Capgemini 2024

Data Engineer Mock Interview - Episode #1

Capgemini Interview Questions And Answers | Capgemini Interview For Freshers | Intellipaat

Company wise interview questions | Capgemini | Data Scientist

Company wise interview questions |Capgemini | Data Analyst

Data Engineer Interview Question | data engineer

Capgemini interview questions| capgemini interview preparation 2022| Capgemini technical interview

Candidate Caught cheating in interview | lip sync | proxy ? #fraud #proxy #onlineinterview

Data Engineering Interview Questions | Data Engineer Interview | Data Engineer | Intellipaat

Companywise Interview Questions | Capgemini| Role: Data Scientist

Data Analyst Technical Mock Interview | Capgemini 2024 | Power BI | Data Analyst Interview

Capgemini SQL Interview Question 2024 | Retrieve the total revenue generated from each product