filmov
tv
Data Microservices in Apache Spark using Apache Arrow Flight

Показать описание
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production. In this talk, we'll break down the modern machine learning pipeline and demonstrate how it can be improved with a modern transport mechanism. First, we will introduce Apache Arrow and Arrow Flight. We will review the motivation, architecture and key features of the Arrow Flight protocol with an example of a simple Flight server and client. Second, we'll introduce an Arrow Flight Spark data source. We will examine the key features of this data source and show how one can build microservices for and with Spark. We will look at the benchmarks and benefits of Flight versus other common transport protocols. Finally, we'll show a Demo of a toy machine learning pipeline running in Spark with data microservices powered by Arrow Flight. We will highlight how much faster and simpler the flight interface makes this example pipeline. The audience will leave this session with an understanding of how Apache Arrow Flight can enable more efficient machine learning pipelines in Spark.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
Комментарии