SBTB 2015, SF Scala @Nitro: Marek Kolodziej, Scala, FP and Spark - the Perfect Combo for ML

preview_player
Показать описание
-----

While FP and Scala have already become the mainstays of middleware, web development and big data stacks (Akka, Play, Kafka, Spark), they tend not to have a big presence in the machine learning and NLP communities. For instance, the emerging deep learning toolkits are mostly Python‐based (Pylearn2, Theano, etc.). The same goes for general-purpose machine learning (Python's scikit-learn, countless R libraries). Performance seekers dissatisfied with slow scripting languages write typed Cython code, contorted C++ libraries bound to scripting language wrappers, or resort to random exotic solutions such as Lua. Some even dispense with all abstraction and write incomprehensible CUDA kernels. There has to be a better way. As a machine learning engineer, I want to write strongly typed functional code. Math has no place for side effects, and I don't want to waste time running a simulation for hours, only to find that I made a typo in my "stringly-typed" script. Unbeknownst to most, Scala's machine learning and NLP ecosystem is growing rapidly, from numeric processing (Spire, Breeze) to big data machine learning (MLLib, Mahout) to GPU‐based text parsing (Puck), to general‐purpose probabilistic programming (FACTORIE). In this talk, I'll do a quick overview of Scala's machine learning ecosystem, and show how easy it is to re-use existing components to build a new, scalable algorithm implementation. If you'd like to see how you can write vectorized linear regression running native BLAS code, based on an SGD/Adagrad implementation written from scratch. capable of running at scale on petabytes of data using Spark, this talk is for you.

Marek Kolodziej is a Senior Research Engineer at Nitro.
Рекомендации по теме