AWS Tutorials - Working with Data Sources in AWS Glue Job

Показать описание

When writing AWS Glue ETL Job, the question rises whether to fetch data from the data source directly or fetch via the glue catalog of the data source. The video talks about why it is recommended to use data catalog. There is also a demo showing data access from S3, PostgreSQL and Redshift using Glue data catalog.

Рекомендации по теме

Комментарии

ab ahiste ahiste ye tough hota jara no doubt detailed video h awesome video h bus ye lakeformation ko lekr br br confusion create hojara lake formation and data lake ko lekr

shamstabrez

Thank you so much for the video. I think it is very useful. I have a few questions, apologies if you might have mentioned them in other videos. 1. On Glue Crawler jobs, assume there is an ETL ingest a particular data source into the Data Lake (product a data file every time), what is your recommendation on the frequency of the glue crawler job: run every time when there is an ETL output file or once a day (if its ingestion frequency is high) to keep the cost low?
2. On the Glue Crawler connection such as the Redshift Connection and the JDBC Connection (in the tutorial), can a single connection be used by multiple Glue jobs simultaneously, i.e. each Glue job create an instance or an object of the Connection?
3. In the video, at time 38.54, a Glue job populated a table "employmentmini" to RDS database, I have not seen the primary key is created on the table into Postgres in the Notebook code. Does this mean that Postgres doesn't enforce the primary key on a table created by Glue job via Glue connection?

hsz

Hi Aws-tutorial, can you please help me in this below situation-
Expected output
-S3 to sql-server(hosted in window server)

Questions 1)
if daily, multiple files are coming in S3 then there would be the same number of table are going to be made in catalog by the crawler?

Question2) To store, all the daily data coming in S3, do we need the as number of etl job as we have the catelog tables?

I know, it's hard to reply to all the questions, I am just hoping you will reply me ! Thanks in advance ❤️

deepakbhardwaj

I have one question, if using data catalog is recommended approach then how you handle daily load coming to data source using crawler.. I finding it difficult to handle daily load using glue crawler.

deepakbhutekar

The does not have way to provide sql query when extracting data. From that we need to use spark. Im getting a communication link failure when I'm trying to read mysql with spark.

SheetalPandrekar

Hi
If cross account access is provided but classification of table is unknown (supposed to be parquet), how to handle this issue?
Without classification, job throws error - No classification for table

swapnilkulkarni

Thank you for the amazing content. I want to ask if my data source is RDS postgres database. and i want to create connection to this data source from another AWS account, then how can i do this. Actually my data source is in another AWS account and am trying to connect it from another AWS account but its not working. Recommendations will be highly appreciated.

imransadiq

Hi, I am trying to consume a Data catalog from a different AWS account into the current account and write a transformation to join both catalogs on a common ID field and store the outcome catalog into this current AWS account. Here is an example
AWSAccount1 had DataCatalog1 and the AWSAccount2 (current AWS account) had DataCatalog2.
I want to write a transformation with join as
DataCatalog1.Table1.empid = DataCatalog2.Table2.empid
and store this merged Datacatalog as Datacatalog3.Table3 in this current account.
Basically, I want to merge the 2 data catalogs into a single bigger Data catalog.
AWSAccount1 only shares its Data Catalog. We do not know much about the data internals.
Is it possible to do this way? I hope we can do it. What are the activities I need to achieve this requirement? Your quick help in this is greatly appreciated. We can do this Athena, but we want to perform this activity in Glue Studio.

sankarsb

Hi How about for Sharepoint as a source? is it possible for AWS Glue job? And is it a jdbc connection or api?

quezobars

n plz elaborate wht is dynamic frame n data frame

shamstabrez

AWS Tutorials - Working with Data Sources in AWS Glue Job

AWS In 5 Minutes | What Is AWS? | AWS Tutorial For Beginners | AWS Training | Simplilearn

Getting Started With AWS Cloud | Step-by-Step Guide

AWS Tutorial For Beginners | AWS Full Course - Learn AWS In 10 Hours | AWS Training | Edureka

Top 50+ AWS Services Explained in 10 Minutes

AWS In 10 Minutes | AWS Tutorial For Beginners | AWS Cloud Computing For Beginners | Simplilearn

Amazon/AWS VPC (Virtual Private Cloud) Basics | VPC Tutorial | AWS for Beginners

Amazon/AWS EC2 (Elastic Compute Cloud) Basics | Create an EC2 Instance Tutorial |AWS for Beginners

What is AWS? AWS Cloud Computing for Beginners | Explained in Plain English

Understanding Security in Cloud, a core AWS Well-Architected Pillar

Simple Queue Service (SQS) Basics | AWS Cloud Computing Tutorial for Beginners

AWS IAM Core Concepts You NEED to Know

Beginners Tutorial to Terraform with AWS

AWS VPC Beginner to Pro - Virtual Private Cloud Tutorial

AWS EC2 Tutorial For Beginners | What Is AWS EC2? | AWS EC2 Tutorial | AWS Training | Simplilearn

AWS Tutorial to create Application Load Balancer and Auto Scaling Group

Intro to AWS - The Most Important Services To Learn

What is Amazon Web Services? AWS Explained | Tutorial & Resources

Create a REST API with API Gateway and Lambda | AWS Cloud Computing Tutorials for Beginners

AWS Vs. Azure Vs. Google Cloud

🧑‍💻 Learn AWS as a beginner for free #aws #programming #amazon #tech #cloudcomputing #awstraining...

AWS Certified Cloud Practitioner Training 2020 - Full Course

AWS & Cloud Computing for beginners | 50 Services in 50 Minutes

🔥 AWS in 60 Seconds! 🔥#aws #shorts

How to learn AWS | AWS Tutorial For Beginners| #ytshorts #shortsvideo #learnaws