Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

preview_player
Показать описание
Input
data = [
(1, "Sagar", 23, "Male", 68.0),
(2, "Kim", 35, "Female", 90.2),
(3, "Alex", 40, "Male", 79.1),
]
schema = "Id int,Name string,Age int,Gender string,Marks float"

Solution:
for i in set_of_dtypes:
cols=[]
if(i==j[1]):

I have prepared many courses on Azure Data Engineering

1. Build Azure End to. End Project

2. Build Delta Lake project

3. Master in Azure Data Factory with ETL Project and PowerBi

4. Master in Python

Check out my courses on Azure Data Engineering

hastags
tags

#dataengineer #interviewquestions #spark
#hashtags #hastag #tags
Рекомендации по теме
Комментарии
Автор

very good you are posting real interview questions many of them simply explain concer defentitiins

kunuturuaravindreddy
Автор

# creating a dict of columns as to avoid checking multiple datatypes
d={}
for col in df.dtypes:
if col[1] not in d:
d[col[1]] = [col[0]]


for key, val in d.items():
df.select(val).show()
# write df to the location

sourav_sarkar_
Автор

int_cols = [col for col, dtype in df.dtypes if dtype == 'int']
string_cols = [col for col, dtype in df.dtypes if dtype == 'string']
float_cols = [col for col, dtype in df.dtypes if dtype == 'float']

Creating DataFrames for each data type
int_df = df.select(int_cols)
string_df = df.select(string_cols)
float_df = df.select(float_cols)

Offical_PicturePerfect
Автор

# creating a dict of columns to avoid checking multiple datatypes
d={}
for col in df.dtypes:
if col[1] not in d:
d[col[1]] = [col[0]]

print(d)

for key, val in d.items():
df.select(val).show()
# write df to the location
#

sourav_sarkar_
Автор

Thank you for posting this video. But, can you please post pyspark interview questions for freshers. Thank you!

aamirmansuri
Автор

Good problem to solve. Thanks for posting sagar!

myl
Автор

My Way Sir

intType = []
stringType = []
floatType = []
for i in df.dtypes:
if i[1] == 'int':
intType.append(i[0])
elif i[1] == 'string':
stringType.append(i[0])
elif i[1] == 'float':
floatType.append(i[0])

dfInt = df.select(*intType)
dfString = df.select(*stringType)
dfFloat = df.select(*floatType)

rawat
Автор

Shouldn’t you use append instead of overwrite

Nextgentrick
Автор

Hi Sagar
this Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame
what was the experience the candidate has ?

pratyushkumar
Автор

My solution is as follows:

string = df
integer = df
float = df

for i in df.dtypes:
if i[1]!='string' and i[1]=='int':
string = string.drop(i[0])
float = float.drop(i[0])
elif i[1]!='string' and i[1]=='float':
string = string.drop(i[0])
integer = integer.drop(i[0])
elif i[1]!='int' and i[1]=='string':
integer = integer.drop(i[0])
float = float.drop(i[0])
elif i[1]!='int' and i[1]=='float':
integer = integer.drop(i[0])
string = string.drop(i[0])
elif i[1]!='float' and i[1]=='string':
float = float.drop(i[0])
integer = integer.drop(i[0])
else:
float = float.drop(i[0])
string = string.drop(i[0])


print(string)
print(integer)
print(float)

_Sujoy_Das
Автор

okay, is this internal functionality of conversion to parq format

bhumikalalchandani
Автор

my solution:
dict={}
for i in df.dtypes:
if i[1] in dict.keys():
l=dict.get(i[1])
l.append(i[0])
dict.update({i[1]:l})
else:
l=[]
l.append(i[0])
dict.update({i[1]:l})

for i in dict.keys():
df_s=df.select(dict.get(i))
df_s.show()

##did show instead of writing

souvikchattopadhyay