filmov
tv
Distributed Computing Workflows with FugueSQL | Orlando Python

Показать описание
In recent years, the role of data analysts working on big data has been expanding. Hive allows analysts to manipulate big datasets. DBT allows data analysts to take ownership of some data engineering processes. More and more tools are coming out that extend the capabilities of SQL, and allow it to be applied in other parts of the data ecosystem. In this interactive workshop, we'll introduce the partitipants to FugueSQL, a lanaguage that allows analysts to work on distributed computing problems. FugueSQL allows users to express computation workflows with a SQL-like language. This allows users to operate on Pandas, Spark, and Dask DataFrames with a language that they are familiar with. We'll demo in Jupyter Notebook how to use FugueSQL along with native Python for end-to-end Extract, Transform, Load (ETL) pipelines.