Apache Beam: using cross-language pipeline to execute Python code from Java SDK

preview_player
Показать описание
Apache Beam: using cross-language pipeline to execute Python code from Java SDK
Alexey Romanenko

A presentation from ApacheCon @Home 2020

There are many reasons why we would need to execute Python code in Java data processing pipelines (and vice versa) - e.g. Machine Learning libraries, IO connectors, user’s Python code - and several different ways to do that. With the End of Life of Python 2 started this year, it’s getting more challenging since not all old solutions still work well for Python 3. One of the potential options for this could be using a cross-language pipeline and Portable Runner in Apache Beam. In this talk I’m going to talk about what the cross-language pipeline in Beam is, how to create a mixed Java/Python pipeline, how to set up and run it, what kind of requirements and pitfalls we can expect in this case. Also, I’ll show a demo of a use case where we need to execute a custom user’s Python 3 code in the middle of Java SDK pipeline and run it with Portable Spark Runner.

Alexey Romanenko is Principal Software Engineer at Talend France, with more than 18 years of experience in software development. During his career, he has been working on very different projects, like high-load web services, web search engine and cloud storages. He is Apache Beam PMC member and committer, he contributed to different Beam IO components and Spark Runner.
Рекомендации по теме
Комментарии
Автор

Hi is there github repo from where I can find the code

shubhanshusonkar