RDD vs DataFrame vs Datasets | Spark Tutorial Interview Questions #spark #sparktuning

preview_player
Показать описание
As part of our spark Interview question Series, we want to help you prepare for your spark interviews. We will discuss various topics about spark like Lineage, reduceby vs group by, yarn client mode vs yarn cluster mode etc. As part of this video we are covering
difference between rdd , dataframe and datasets.

Please subscribe to our channel.
Here is link to other spark interview questions

Here is link to other Hadoop interview questions
Рекомендации по теме
Комментарии
Автор

DataFrame also serialize the data into off-heap storage in binary format and then perform transformations directly on off heap memory as spark understands the schema. Also provides a Tungsten physical execution back-end which explicitly manages memory and dynamically generates byte-code for expression evaluation. So does memory management better here.

ravinderkarra
Автор

Nice and clear explanation. To the point thanks.

souravsinha
Автор

Nice explanation. Can you please explain how to do check pointing & resume a failed spark job(due to action/transformation failure and executor memory exceeded) in another video?

rameshgangabathula
Автор

cooollll great answer sir... thanks !!!

someshmungikar
Автор

Very nice explanation. Your videos really help me while preparing for interviews. Highly recommend. Thank you!

apekshatrivedi
Автор

When to use dataframe and when to use dataset and when to use Rdd and spark sql, sparkSession

rahulshandilya
Автор

Again an very nice video, thanks and it would be great if you provide a pseudo code or simple code sytax for each abstractions so that understanding will be very clear

bhargavhr
Автор

nicely explained. Thank you for your effort on gathering information and publishing it. much needed videos it is

ganeshdhareshwar
Автор

@Data Savvy - A small correction, at 8:10 you mentioned that we cannot do map, join and other operations on a DataFrame

arundhingra
Автор

It will serialize the data or deserialize coz as far as i know we deserialization is conversion of byte stream to. Java object. Please correct if i am wrong.

RahulRawat-wuvv
Автор

Then people arent using dataset everywhere?

Pratik
Автор

Please provide aws questions and answers

raviyadav-dttb
Автор

ERROR! Actualy Dataframe, Dataset, RDD - it is correct order of performance from very effective to not effective. DF is better performance then DS because not using serialization and desirialization when work with data

alexperit
Автор

Really helpful content. Much appreciated.

TusharKakaiya
Автор

If I understood correctly, PySpark does not support the Datasets because Python is not a type-safe language, right?

yeoreumkwon
Автор

I am new to the spark and big data world. I choose to use/learn pyspark because I am familiar with python. I got to know the python is not type-safe and does not support for datasets. Can someone say does pyspark is used in building real-world applications Or Do I need to learn scala/java.
Thanks.
-Great video

chiranjeevikatta
Автор

Very informative ..just one thing voice is too low in video .

shubhamkumar-uzux
Автор

Thank you.
Last time in my interview,
interviewer asked me same question...

naresh
Автор

When to use dataframe and when to use dataset?

ajaypratap
Автор

Hi - Can you please share details on why dataset api is not available in Python?

ambikaiyer