AWS Glue PySpark: Flatten Nested Schema (JSON)

preview_player
Показать описание
This is a technical tutorial on how to flatten or unnest JSON arrays into columns in AWS Glue with Pyspark. This video will walk through how to use the relationalize transform and how to join the dynamic frames together for further analysis or writing to another location.

#aws #awsglue
Рекомендации по теме
Комментарии
Автор

This is a truly amazon channel helping people to understand and learn more about ETL and cloud computing.

Thanks so much!

oggyoggyoggyy
Автор

Hi, Can we position or change the column order when transforming a json file while loading the metadata

Streampax
Автор

Hi I have a complete nested json file while I am running crawler on it, i am getting only one schema wtih column name array and data type array and in that array data type the column name and datatype are present is that correct

meghanayerramsetti
Автор

Amazing, this kind of scenarios are presently more often than not as a data engineer.

Can we run python code within the same interactive session notebook?

So for instance, we can run python code to pull data from API (using requests) or SQS destination queue in json, then relationalize with pyspark code?

saad
Автор

Hi ... this is great video, have a question: what happen if we dont have a common column to join the different dataset on? is there any work around it?

najwanabdulkareem
Автор

Hi your channel is really awesome and helpful, was wondering if it is possible to join on 2 different json files stored in separate s3 buckets

claytonvanderhaar
Автор

I don't know the exact schema for the input table, would you have any dynamic way of approach for same scenario instead of hard coding the column names

offersononlineshopping
Автор

Hi please provide the dataset you used, it will be great

shrikantpandey