Apache Spark Week Day 2 | #101

Показать описание

On day two of the Apache Spark week we look major Apache Spark concepts: RDDs, transformations and actions, caching and broadcast variables.

Also check out Jacek Laskowski book on Spark:

Check out my free 100+ pages data engineering cookbook on GitHub:

Please SUPPORT WHAT YOU LIKE:

(Send a message and I read it on the stream)

- YouTube SuperChats while live streaming

- As an Amazon Associate I earn from qualifying purchases from Amazon. Just use this link:

- I get asked a lot about my podcast gear. This is a list of all the equipment I currently use to create this Podcast:

#ApacheSpark #DataEngineering #PplumbersofDataScience #bigdata

Рекомендации по теме

Комментарии

Hi Andreas! It worths to mention that using spark caching (or memory only persisting) for significant amount of data it's almost inevitable to stumble into OOM problem and solving it will require changing of spark memory management settings (see spark.memory.fraction and

And regarding the broadcast stuff and accumulators... First one is just a way to distribute some data needed for computation among executors. It's not writable on workers. And accumulator is like counter in MapReduce framework. It's needed to pass some information from worker to driver. So, it's writable from executors, but can be read from driver only.

dmitryamosov

Hi Andreas, Thank you such a great initiative, are you doing this streaming daily ?. A question here, what will be difference between SparkContext.addFile() and broadcast variable? first adds a file to a distributed cache later also add smaller file into executors memory.

sumityadav

It will be ur great philanthropy on me.

KK-lrjq

Apache Spark Week Day 2 | #101

Apache Spark Week Day 2 | #101

Apache Spark Week Day 1 | #100 (2nd try)

Apache Spark Week Day 3 | #102

'Exploring Wikipedia With Apache Spark' - Part 2, Advanced Training by Sameer Farooqui (Da...

Extending Spark SQL 2 4 with New Data Sources Live Coding Session -Jacek Laskowski

End of Week Apache Spark Code Review - part 2

Starting the Apache Spark Week | #099

Advancing Spark - Data + AI Summit 2022 Day 2 Recap

Apache Spark Week Day 4 | #103

OpenShift Commons Big Data SIG #2: Running Apache Spark Natively on Kubernetes

[PART 2] Apache Spark - Making Sense of Big Data Faster and Easier

Deep Dive into Query Execution in Spark SQL 2 3 with Jacek Laskowski

Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Problematic Queries

Spark SQL 2 0 Experiences Using TPC DS (Berni Schiefer)

Two days Apache Spark Workshop Day1

Apache Spark Code Review - Follow up edition - with PR merges

Enabling Vectorized Engine in Apache Spark

Apache Spark Tutorial: Apache Spark for Beginners | Apache Spark |@OnlineLearningCenterIndia

Conquering Hadoop and Apache Spark with Operational Intelligence (Akshay Rai)

Developer Last Expression 😂 #shorts #developer #ytshorts #uiux #python #flutterdevelopment

Exploratory Analytics 101 with Apache Spark for Synapse and Notebooks

DataFriday #2 - Basic ETL ops with Apache Spark

Apache Spark Lessons Learned Part 2

Keeping the fun in Apache Spark Datasets and FP by Holden Karau