RDDs, DataFrames and Datasets in Apache Spark - NE Scala 2016

preview_player
Показать описание
Traditionally, Apache Spark jobs have been written using Resilient Distributed Datasets (RDDs), a Scala Collections-like API. RDDs are type-safe, but they can be problematic: It's easy to write a suboptimal job, and RDDs are significantly slower in Python than in Scala. DataFrames address some of these problems, and they're much faster, even in Scala; but, DataFrames aren't type-safe, and they're arguably less flexible.

Enter Datasets, a type-safe, object-oriented programming interface that works with the DataFrames API, provide some of the benefits of RDDs, and can be optimized via the Catalyst optimizer.

This talk will briefly recap RDDs and DataFrames, introduce the Datasets API, and then, through a live demonstration, compare the performance of all three against the same non-trivial data source.

Talk by Brian Clapper
March 4th, 2016

Produced by NewCircle - Spark Training & Resources:
Рекомендации по теме
Комментарии
Автор

This was 4 years ago. But still it helped a ton. Now Datasets are integral part of spark.

apetiteful
Автор

confused by datasets and dataframe, this video solve my confusion!

yonglelyu
Автор

Awesome explanation. Thanks for uploading.

prabhubentick
Автор

He commented about "lambdas" a lot. I know what lambda functions are but somebody explain the context in which he is talking about "lambda" in this video? for instance while starting with datasets here 18:12

prateekgautam
Автор

This was really helpful. Thanks a ton!!

RahulChaudharyy
Автор

good lecture ... please let me ask one thing if your hair is RAW Data and your beard is structure Data and your Clothes are semi Structure Data which Technique Should be used RDD, DataFram Or Data set please Explain with Example.

nasreenmohsin
Автор

I had to google UTSL, I'm glad I did

FaraazAhmad
Автор

Do anyone know the answer to that question asked in last? Do they are going to use datasets in mllib libraries?

AmitKumarGrrowingSlow
Автор

For me, all these presentations are the same and are very high level unfortunately..

Ayoub-adventures
Автор

Чё, уже и Коломойский в BigData подался? ;)

EugenePetrash