AWS Tutorials - Using AWS Glue ETL Job with Streaming Data

preview_player
Показать описание

Recently AWS announced streaming data support for AWS Glue ETL Jobs which helps in setting up continuous ingestion pipelines that processes streaming data on the fly. Streaming ETL jobs consume data from streaming sources likes Amazon Kinesis and Apache Kafka, clean and transform those data streams in-flight, and continuously load the results into Amazon S3 data lakes, data warehouses, or other data stores.

In this workshop, you create an ETL job which will read streaming data from Kinesis data stream and upload to Amazon S3 bucket. The ETL job will transform data from JSON to CSV format. The data to Kinesis stream is published using MQTT client using AWS IoT Core.
Рекомендации по теме
Комментарии
Автор

Thank you for this very helpful video, especially the steps to clean AWS glue job. I had hard time to figure that out, and got a big bill from AWS.

robindong
Автор

Hi sir, I am facing issue in creating the same project through CDK. Can you please share a resource to create streaming ETL glue job with cdk

ishitagoel
Автор

Hi Sir,
I got the job failed with following error
"An error occurred while calling o93.getDynamicFrame. Streaming data source doesn't support dynamic frame"
Can you please recommend on this

vivekta
Автор

Nice explanation sir. Kindly create a video with everything parameterized, such as source files, target files, incremental parameter from target lookup

Cricketshaala
Автор

felling I m siting and learning infront of you, good one!
I have a simple use case need your input also- the source side is kinesisStream and data is in byteStream fromat now same data need to ingest into s3 using glue.
input from your side: FirstThing is it mandatory to have the stream data in Json format?
If it is not in Json do we need to convert into json ?
Second Thing: Do we really need database and catalog?
Third Thing I'm getting lil bit confuse:
so could you please let me know the correct pipeline/datalke from kinesis to s3 I mean what are the steps need to execute

sachinupadhyay
Автор

Hi all,

We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.

alokanand
Автор

Nice one. A question, Can I read from a kafka topic in AWS MSK managed kafka and then do some transform or filter etc and then write back. To. A different topic in kafka?

clover