Analytics Fest 2020 - Data Transformation and Image Analysis with PySpark

preview_player
Показать описание
PySpark is the Python API for Spark. By using it, we can take advantage of everything implemented inside Spark for us to perform data transformation and analysis with the convenience provided by Python and its ecosystem. In this talk we explore a simple project for basic data transformation and loading into SQL schemas, as well as image analysis using K-Means for dominant color detection and storing its results into HDFS.

We continue offering great job opportunities for software engineers, quality assurance engineers, data scientists and Scrum masters.
Рекомендации по теме