Apache Spark - Computerphile

Показать описание

Analysing big data stored on a cluster is not easy. Spark allows you to do so much more than just MapReduce. Rebecca Tickle takes us through some code.

This video was filmed and edited by Sean Riley.

Рекомендации по теме

Комментарии

note to the editor: please stop cutting away from the code so quickly. we're trying to follow along in the code based on what she's saying. at that moment, we don't need to cut back to the shot of her face. we can still hear her voice in the voiceover.

notangryjustdismayed

The RDD API is outmoded as of Spark 2.0 and in almost every use case you should be using the Dataset API. You lose out on a lot of improvements and optimizations using RDDs instead of Datasets.

Hourai

pretty sure theres a typo in that code. "splitLines" doesnt exist and is probably supposed to be words.map(...) instead

Bolt

Can you do Apache Kafka next? How do they compare?

Technomancr

ahh.. so refreshing after taking a week break from dev work and staying away from non dev topics. Lol, I love our field. Like music to my ears

xIAMROOT

Is there any meta analysis on the usefulness of bigdata analysis? How often do jobs get run that either produce no meaningful data or don't produce any statistically significant data?

recklessroges

Brady Please make a video on Kubernetes

mm

feels like this video is four years too late ... :-/

Xakriss

Thank you for teaching an old man new things.

williamwurthmann

She refers to an early example. Did I miss that video? Otherwise, nicely done. Love learning about distributed computing.

KurtSchwind

Wow congrats on the content. You were able to explain it in a concise, yet logical and detailed way. nice

tablit.

A great example of how programming languages are a reasonably efficient mechanism to communicate sections of program and how natural language really is not.

tackline

These data ones are really good! Keep them coming!

alexkompos

She's damn good at explaining and easy to listen to, any plans of having her host other episodes?

(sorry for "her" I don't know her name).

Mmouse_

For anyone interested, although the documentation is awful for Apache Flink and it doesn't support Java versions beyond 8, it at least lets you do setup on each node. Spark does not have any functionality for running one-time setup on each node, which makes it infeasible for many use cases. These distributed processing frameworks are quite opinionated and if you're not doing word count or streaming data from one input stream to another with very simple stateless transformations in between you'll find little in the documentation or functionality. They're not really designed for use cases where you have a parallel program with a fixed size data source known in advance and want to scale it up as you would by adding more threads, but more for continuous data processing.

nO_dNAL

typo in line 32 for using `splitLines` instead of `word`?

PaulSukys

It's so clear and easy after the explanation! I will be waiting for more vids about clustering and distributed computing)

xakkep

More of these, please. More big data.

MJ-em_jay

Computerphile will be excited to learn that tripods exist.

michaelebbs

I wish she also talked a little about Spark's ability to deal with data streams

Alex

Apache Spark - Computerphile

Apache Spark - Computerphile

MapReduce - Computerphile

The HARDEST part about programming 🤦‍♂️ #code #programming #technology #tech #software #developer...

What Is Apache Spark?

Apache Spark in 60 Seconds

What exactly is Apache Spark? | Big Data Tools

What is Big Data? - Computerphile

Learn Apache Spark in 10 Minutes | Step by Step Guide

Apache Spark / PySpark Tutorial: Basics In 15 Mins

What Is Apache Spark? | Apache Spark Tutorial | Apache Spark For Beginners | Simplilearn

What is Apache Spark?

Spark Tutorial For Beginners | Big Data Spark Tutorial | Apache Spark Tutorial | Simplilearn

Apache Spark Optimization with @priyachauhan813 . Check the full video #apachespark

Spark Full Course | Spark Tutorial For Beginners | Learn Apache Spark | Simplilearn

Apache Spark? If only it worked. by Marcin Szymaniuk

Hadoop In 5 Minutes | What Is Hadoop? | Introduction To Hadoop | Hadoop Explained |Simplilearn

#09 | Snowpark Vs. Apache Spark | Will Spark Survive?

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spark Tutorial |Simplilearn

Google SWE teaches systems design | EP39: Apache Spark

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Apache Spark Full Course | Apache Spark Tutorial For Beginners | Learn Spark In 7 Hours |Simplilearn

Apache Spark : Deep dive into the Java API for developers by Alexandre Dubreuil

What Is Apache Spark | Apache Spark Tutorial For Beginners | Simplilearn

What is Apache Spark? | Introduction to Apache Spark | Apache Spark Certification | Simplilearn