AWS Glue 101 | Lesson 1: The Glue Data Catalog And Crawlers

Показать описание

00:00 - Intro
00:24 - What is the AWS Glue Data Catalog?
00:36 - What is a metadata repository?
00:53 - What is metadata information?
01:18 - How do we collate the metadata?
01:43 - AWS Crawler
02:01 - When do we use the Data Cataalog?
03:32 - Interacting With The Glue Data Catalog
04:12 - What the tutorial will cover
04:34 - Hands on Tutorial
04:52 - S3 configuration
08:16 - Creating a database
08:56 - Setting up a crawler
12:28 - Recap
12:59 - Bonus: Athena

In this series of videos we take a look at AWS Glue. We mix the theory with the practical as we build a functioning ETL application using the Glue Data Catalog, Crawlers, Glue ETL, Triggers, Workflows and Dev Endpoints

In this video we configure our S3 bucket to act as our data repository, ingest data, register that data using a crawler with the Glue Data Catalog and finally use Athena to query the newly ingested data.

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products.

AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows with a few clicks in AWS Glue Studio. Data analysts and data scientists can use AWS Glue DataBrew to visually enrich, clean, and normalize data without writing code. With AWS Glue Elastic Views, application developers can use familiar Structured Query Language (SQL) to combine and replicate data across different data stores.

😎 About me
I have spent the last decade being immersed in the world of big data working as a consultant for some the globe's biggest companies. My journey into the world of data was not the most conventional. I started my career working as performance analyst in professional sport at the top level's of both rugby and football. I then transitioned into a career in data and computing. This journey culminated in the study of a Masters degree in Software development. Alongside many a professional certification in AWS and MS SQL Server.

Рекомендации по теме

Комментарии

best video of Glue on Youtube. Thanks Johnny

josephattabenninjr

Keep up the good work, you'll be viral soon.

adityarajora

Johnny, Following step by step from your detailed tutorial. Super helpful :)

jadenguyen

Fantastic 101 Glue session! Good Job Johnny

karamveerhooda

Thanks for the explanation! Something is not yet cleat to me - at min 13:29, why is there a need to set the Query result location? does this mean the every Athena query performed on this catalog table is saved at this location or only the latest query results? thanks in advance!

Incognitowil

awesome video, upto the point and clear explaination.

The_Bold_Statement

Great video and channel! Keep'em coming, buddy!

tommysera

Ty so much for sharing your experience 💜 your insightful content!

nikozerk

So, what happened if in the same folder (for example your customer folder) we have two CSV, each with different schema? Will crawler create a two tables?

katsouranis

Hey Johnny I have a question. If you produce new data and want new tables with each crawler run is that possible or would you need to create new crawlers per external table you want produced

lotannanweke

can we do read from catalog and use glue etl (spark) and save into a new glue catalog without using s3 and crawler? glue catalog -> glue etl-> glue catalog ?

mehedeehassan

So if you have multiple csv files that contain different data, do you set up different data stores for each file, or will one data store handle the different schemas?

ws

how to add column names in camel case in glue catalog

saibaba

AWS Glue 101 | Lesson 1: The Glue Data Catalog And Crawlers

AWS Glue 101 | Lesson 1: The Glue Data Catalog And Crawlers

AWS Glue 101 | Lesson 0: Introduction To AWS Glue

AWS Glue 101 | Lesson 3: Glue Triggers and WorkFlows

AWS Glue 101 | Lesson 2: Glue ETL

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

Beginners Guide To AWS Glue

AWS Glue 101 | Lesson 4: Glue Dev Endpoints

What is AWS Glue? | AWS Glue explained in 4 mins | Glue Catalog | Glue ETL

PySpark For AWS Glue Tutorial [FULL COURSE in 100min]

Glue Trigger 101

Glue Workflow 101

AWS Data Lakes 101 | Lesson 5: Lake Formation and Glue ETL

AWS Glue Data Quality 101

AWS Glue crawler 101

AWS Glue workflow

AWS Glue | AWS Glue Tutorial | AWS Glue ETL | AWS Tutorial for Beginners | Getting Started with Glue

AWS Glue overview | Getting started - AWS Glue tutorial | p1

AWS Glue | AWS Glue tutorial | what is AWS Glue | #shortsfeed #shorts #short #ytviral #shortfeeds

AWS Tutorials Shorts - Optimizing AWS Glue Crawler for ever increasing data

Top 50+ AWS Services Explained in 10 Minutes

AWS Glue Classifier 101

AWS Vs. Azure Vs. Google Cloud

AWS Glue Streaming Cost Saving New Features | Amazon Web Services

Introduction to AWS Glue Elastic Views