DataFrame vs Dataset | Choose Between Dataframe and Dataset | Apache Spark Tutorial |Spark Interview

preview_player
Показать описание
As part of our spark Interview question Series, we want to help you prepare for your spark interviews. We will discuss various topics about spark like Lineage, reduceby vs group by, yarn client mode vs yarn cluster mode etc. As part of this video we are covering
difference between rdd , dataframe and datasets.

Please subscribe to our channel.
Here is link to other spark interview questions

Here is link to other Hadoop interview questions

#apachespark #sparkTutorial #rdd #dataframe
#dataset
Рекомендации по теме
Комментарии
Автор

Tq boss awesome video.. Got clear picture about data feame and dataset and also other video also superb.. Nice u helping a lot. God bless u😊

akshathab.s
Автор

I am finding your videos very helpful and informative. I hope to see many more videos coming up in this channel regarding spark and other bigdata tools.

viraajsivaraju
Автор

Please also make videos on real time projects with complete overview of the project and various tools used in them and why only those tools for different kind of scenarios as you have vast expericence in this field

viraajsivaraju
Автор

Please do video on scala functional programming language please please please.. Please🙏🙏, ur explanation make us to understand very gud yar please do it. Main Concepts of scala like case class, pattern matching etc. Concepts can understand smwt but don't where and when to use those please do video on those concepts.. Please🙏🙏.. If u do it will be very much helpful.

akshathab.s
Автор

Hi, Kindly clarify on the below.

Can we partition data on key while creating the data frame. I am not referring to writing a file from a data frame. Say i have a csv file and a 10 node cluster. The first step in my spark code is creating a data frame from this csv. Can i create the data frame with data being partitioned on key ? The idea is, when i use a join/group by down the line and as my df is already partitioned on the join/group by key and re shuffle can be avoided?

nandu
Автор

Dataframes are immutable, how efficiently can i update or change column data based on another dataframe using join.What is the best way to convert a sql update equivalent in terms of dataframe in a ETL scenario in other words

aaronantony
Автор

Thanks Harjeet for great video. If I want to use Windows ranking and analytical functions is it possible to use data sets?

kiranmudradi
Автор

We use catalyst optimiser and tungsten in dataframe as well, then what's better.

shashanksoni
Автор

As you explain thn it could be called dataset are compiled time safe
Why its type safe .. i mean anything related to Type ???

sachink
Автор

So is there any alternate for datasets in python??

the_high_flyer