Create and Insert Data Into a Parquet Governed Table In Python

preview_player
Показать описание
In this step by step tutorial, Using the AWS Data Wrangler library in python, this tutorial is a walkthrough on how to create a parquet table that is a governed table and insert data with transactions from AWS lake formation. It also covers how to query governed tables by specific lake formation transactions.

Timeline:
00:00 Overview
00:34 Create Governed Table (metadata only)
02:46 Read csv data from s3 into a dataframe
03:28 Filter dataframe by date
04:32 Create a new transaction
05:36 Write to Governed Table (parquet)
07:18 Commit Transaction
08:38 Append data to the existing governed table
10:40 Read data from governed table

#AWS
Рекомендации по теме
Комментарии
Автор

thanks a ton. it's very well organized content for governed table!

hoijongjung
Автор

Thanks for the video. How do you handle Upserts? I see only object level ACID but not row level? is it handled in the query?

mayjoec
Автор

Thank you so much! I was wondering why I couldn't query my data directly through Athena till I came across this video. Also, it would be nice if you could share the code through something like Git repository for lazy people :'D . Fantastic video though!

anesanreddy
Автор

Have you ever tried writing to a governed table from a lambda triggered by the S3 event of a new file being dumped by firehose? in order to do an incremental update?

jorgegoldman
Автор

Does governed table support record level updates? Like an upsert in Delta Tables from databricks.

cory
Автор

Good one. Is it possible to query this governed table in Athena with time travel filter? I tried querying the governed table like below, but no luck.
SELECT * FROM governed_table
FOR SYSTEM_TIME AS OF TIMESTAMP '2021-12-10 10:00:00'

parthasar