filmov
tv
SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro
Показать описание
-----
Notebooks are not a new thing - they've been around since Mathematica and the IPython Notebook. In this talk, I make the claim that they are just as important for the Big Data community, and show how useful they can be for reporting findings from algorithms executed at scale on Spark. I showcase the Databricks Cloud, which provides hosted, scalable Spark deployments without any devops effort on the user end, together with a Notebook UI for running experiments and persisting results to enable reproducible research. I demo the notebook by demonstrating Nitro's version of the Big Data "hello world" problem, the font count across a Spark RDD of PDF documents.
Marek Kolodziej is a Sr. Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 4 years, and statistics and econometrics for the past 9. He is passionate about functional programming and static typing in general, and about Scala in particular. He is obsessed about production-quality data science - the insights are only useful if the deployment is rock-solid, hence his focus on the JVM. Marek got his PhD in Energy and Environmental Economics from Boston University.
Notebooks are not a new thing - they've been around since Mathematica and the IPython Notebook. In this talk, I make the claim that they are just as important for the Big Data community, and show how useful they can be for reporting findings from algorithms executed at scale on Spark. I showcase the Databricks Cloud, which provides hosted, scalable Spark deployments without any devops effort on the user end, together with a Notebook UI for running experiments and persisting results to enable reproducible research. I demo the notebook by demonstrating Nitro's version of the Big Data "hello world" problem, the font count across a Spark RDD of PDF documents.
Marek Kolodziej is a Sr. Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 4 years, and statistics and econometrics for the past 9. He is passionate about functional programming and static typing in general, and about Scala in particular. He is obsessed about production-quality data science - the insights are only useful if the deployment is rock-solid, hence his focus on the JVM. Marek got his PhD in Energy and Environmental Economics from Boston University.
SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro
SBTB 2015, SF Scala @Nitro: Marek Kolodziej, Scala, FP and Spark - the Perfect Combo for ML
Vehicle Tracking
BDSBTB 2015: Marek Kolodziej, Unsupervised NLP Using Word Embeddings, Scala and Apache Spark
Advanced Lane Finding
Prezentacja - Marek Kołodziej i Sandra Banasiak
Text By the Bay 2015: Marek Kolodziej, Unsupervised NLP Tutorial using Apache Spark
SF Scala, Marek Kolodziej: Integrating Non-Reactive Legacy Code with Akka -- the Case of R
DBTB INT Marek Kolodziej r
data.bythebay.io: Marek Kolodziej, Hidden GEMMs: How Optimized Math Libraries Work
Scala for Spark - Introduction
Tempo La So : L'alcool: fête et défaite
Spark Notebook with Dynamic and Reactive SQL
TTOW - Spark Notebook Walkthrough
SF Scala: Andy Petrella, Spark Notebook: beefed-up REPL for reproducible distributed data analysis
PNWS 2014 - Apache Spark I: From Scala Collections to Fast Interactive Big Data with Spark
Adam Gibson of Skymind, Q&A with Alexy Khrabrov of SF Spark
An Update on Distributed Computing with Spark, Reza Zadeh 20141025
Adam Gibson, DeepLearning4j on Spark and Data Science on JVM with nd4j, SF Spark @Galvanize 20150212
Deep Learning and NLP with Spark by Andy Petrella and Melanie Warrick
SB 20150209 'PredictionIO - A Machine Learning Server for Scala Developers'
SF Scala, Peter Potts: Cake Pattern in Practice
Chris Yang, Scala Notebook -- SF Scala @Nitro 20150205
ND4J: A scientific computing framework for the JVM
Комментарии