GrayMatter Pyspark Interview Question - Get null count of all columns

preview_player
Показать описание
Pyspark Interview questions recently asked in GrayMatter interview.
We need to Get null count of all columns in dataframe.

Lets see how we can achieve this by using Pyspark.

Mentioning the dataframe details here
data = [(1, None, 'ab'),
(2, 10, None),
(None, None, 'cd')]
columns = ['col1', 'col2', 'col3']

For more Azure Data Bricks interview questions. Check out our playlist.

Contact us:

Follow us on
Рекомендации по теме
Комментарии
Автор

select sum(case when col1 is null then 1 else 0 end )as col1,
sum(case when col2 is null then 1 else 0 end) as col2,
sum(case when col2 is null then 1 else 0 end) as col3
from input_df;

dasubabuch
Автор

My SQL solution -

select
sum(case when col1 is null then 1 else 0 end) as col1,
sum(case when col2 is null then 1 else 0 end) as col2,
sum(case when col3 is null then 1 else 0 end) as col3
from temp;

adityavamsi
Автор

My sql solution -

SELECT COUNT(*)-COUNT(COL1) AS COL1,
COUNT(*)-COUNT(COL2) AS COL2,
COUNT(*)-COUNT(COL3) AS COL3
FROM SAMP_DF;

Sachin_Sambare