filmov
tv
Holden Karau: A brief introduction to Distributed Computing with PySpark
Показать описание
PyData Seattle 2015
Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. This tutorial will briefly introduce PySpark (the Python API for Spark) with some hands-on-exercises combined with a quick introduction to Spark's core concepts. We will cover the obligatory wordcount example which comes in with every big-data tutorial, as well as discuss Spark's unique methods for handling node failure and other relevant internals. Then we will briefly look at how to access some of Spark's libraries (like Spark SQL & Spark ML) from Python. While Spark is available in a variety of languages this workshop will be focused on using Spark and Python together.
Materials available here:
00:10 Help us add time stamps or captions to this video! See the description for details.
Apache Spark is a fast and general engine for distributed computing & big data processing with APIs in Scala, Java, Python, and R. This tutorial will briefly introduce PySpark (the Python API for Spark) with some hands-on-exercises combined with a quick introduction to Spark's core concepts. We will cover the obligatory wordcount example which comes in with every big-data tutorial, as well as discuss Spark's unique methods for handling node failure and other relevant internals. Then we will briefly look at how to access some of Spark's libraries (like Spark SQL & Spark ML) from Python. While Spark is available in a variety of languages this workshop will be focused on using Spark and Python together.
Materials available here:
00:10 Help us add time stamps or captions to this video! See the description for details.
Holden Karau: A brief introduction to Distributed Computing with PySpark
A very brief introduction to extending Spark ML for custom models - Holden Karau
Introduction to Spark Datasets by Holden Karau
Berlin Buzzwords 2017: Holden Karau - A Brief Tour of the Magic Behind Apache Spark #bbuzz
Interview - Holden Karau
Data Science in 30 Minutes - A Quick Introduction to PySpark with Holden Karau
Kubeflow for Machine Learning • Holden Karau & Adi Polak • GOTO 2022
Holden Karau - The Magic Behind PySpark, how it impacts perf & the 'future'
Scaling Python for Machine Learning: Beyond Data Parallelism • Holden Karau • GOTO 2023
Holden Karau and Paco Nathan discuss PySpark and the future of Python
The magic of distributed systems: when it all breaks and why / Holden Karau
Holden Karau - Interview Engineering and Data Tools
Kubeflow for Machine Learning (Teaser) • Holden Karau & Adi Polak • GOTO 2022
scala.bythebay.io: Holden Karau Interview
BeeScala 2016: Holden Karau - Ignite your data with Spark 2.0
Holden Karau - Keynote: Distributed Computing 4 Kids -- with Spark | PyData Seattle 2023
Scaling Machine Learning with Spark (Teaser) • Adi Polak & Holden Karau • GOTO 2023
Holden Karau and Cheburashka interview at JOTB2018
Keynote: Making the Big Data ecosystem work together with Python - Holden Karau
OSCON 2016 - Getting started contributing to Apache Spark by Holden Karau (IBM)
Holden Karau at Beyond the Code
Holden Karau, IBM | BigDataNYC 2016
Simplifying Training Deep & Serving Learning Models...using Tensorflow - Holden Karau
Extending Spark ML for Custom Models - Holden Karau
Комментарии