SF Scala: Marek Kolodziej, Spark and Databricks Notebook at Nitro

preview_player
Показать описание
-----

Notebooks are not a new thing - they've been around since Mathematica and the IPython Notebook. In this talk, I make the claim that they are just as important for the Big Data community, and show how useful they can be for reporting findings from algorithms executed at scale on Spark. I showcase the Databricks Cloud, which provides hosted, scalable Spark deployments without any devops effort on the user end, together with a Notebook UI for running experiments and persisting results to enable reproducible research. I demo the notebook by demonstrating Nitro's version of the Big Data "hello world" problem, the font count across a Spark RDD of PDF documents.

Marek Kolodziej is a Sr. Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 4 years, and statistics and econometrics for the past 9. He is passionate about functional programming and static typing in general, and about Scala in particular. He is obsessed about production-quality data science - the insights are only useful if the deployment is rock-solid, hence his focus on the JVM. Marek got his PhD in Energy and Environmental Economics from Boston University.
Рекомендации по теме
Комментарии
Автор

BTW, when discussing the fragmentation of the Hadoop ecosystem, I meant Giraph, not GraphX, for graph processing (GraphX is part of the Spark ecosystem).

mkolod