SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro

Показать описание

-----

Notebooks are not a new thing - they've been around since Mathematica and the IPython Notebook. In this talk, I make the claim that they are just as important for the Big Data community, and show how useful they can be for reporting findings from algorithms executed at scale on Spark. I showcase the Databricks Cloud, which provides hosted, scalable Spark deployments without any devops effort on the user end, together with a Notebook UI for running experiments and persisting results to enable reproducible research. I demo the notebook by demonstrating Nitro's version of the Big Data "hello world" problem, the font count across a Spark RDD of PDF documents.

Marek Kolodziej is a Sr. Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 4 years, and statistics and econometrics for the past 9. He is passionate about functional programming and static typing in general, and about Scala in particular. He is obsessed about production-quality data science - the insights are only useful if the deployment is rock-solid, hence his focus on the JVM. Marek got his PhD in Energy and Environmental Economics from Boston University.

FunctionalTV

Рекомендации по теме

Комментарии

BTW, when discussing the fragmentation of the Hadoop ecosystem, I meant Giraph, not GraphX, for graph processing (GraphX is part of the Spark ecosystem).

mkolod

SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro

SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro

SBTB 2015, SF Scala @Nitro: Marek Kolodziej, Scala, FP and Spark - the Perfect Combo for ML

Vehicle Tracking

BDSBTB 2015: Marek Kolodziej, Unsupervised NLP Using Word Embeddings, Scala and Apache Spark

Advanced Lane Finding

Prezentacja - Marek Kołodziej i Sandra Banasiak

Text By the Bay 2015: Marek Kolodziej, Unsupervised NLP Tutorial using Apache Spark

SF Scala, Marek Kolodziej: Integrating Non-Reactive Legacy Code with Akka -- the Case of R

DBTB INT Marek Kolodziej r

data.bythebay.io: Marek Kolodziej, Hidden GEMMs: How Optimized Math Libraries Work

Scala for Spark - Introduction

Tempo La So : L'alcool: fête et défaite

Spark Notebook with Dynamic and Reactive SQL

TTOW - Spark Notebook Walkthrough

SF Scala: Andy Petrella, Spark Notebook: beefed-up REPL for reproducible distributed data analysis

PNWS 2014 - Apache Spark I: From Scala Collections to Fast Interactive Big Data with Spark

Adam Gibson of Skymind, Q&A with Alexy Khrabrov of SF Spark

An Update on Distributed Computing with Spark, Reza Zadeh 20141025

Adam Gibson, DeepLearning4j on Spark and Data Science on JVM with nd4j, SF Spark @Galvanize 20150212

Deep Learning and NLP with Spark by Andy Petrella and Melanie Warrick

SB 20150209 'PredictionIO - A Machine Learning Server for Scala Developers'

SF Scala, Peter Potts: Cake Pattern in Practice

Chris Yang, Scala Notebook -- SF Scala @Nitro 20150205

ND4J: A scientific computing framework for the JVM