How to Build a Delta Live Table Pipeline in Python

preview_player
Показать описание
Delta Live Tables are a new and exciting way to develop ETL pipelines. In this video, I'll show you how to build a Delta Live Table Pipeline and explain the gotchas you need to know about.

Patreon Community and Watch this Video without Ads!

Useful Links:

What is Delta Live Tables?

Tutorial on Developing a DLT Pipeline with Python

Python DLT Notebook

DLT Costs

Python Delta Live Table Language Reference

See my Pre Data Lakehouse training series at:
Рекомендации по теме
Комментарии
Автор

Great video. Like how you dive into other topics like should we use it? What does it cost? It's running extra nodes in the background....etc. Lot of useful info in your explanations. Just wanted to mention on the expectations not having a splitter to an error table, we had a demo from Databricks recently and their approach was to create a copy of the function with the expectation, but pointed at the error table and with the inverse expectation of the main function. I mentioned this wasn't ideal since you would have to run the full job twice and they didn't have much to say. We have a different approach to dealing with errors so not a huge deal from our standpoint, but still not great in general.

gatorpika
Автор

Thanks for this video Bryan.
13:27 if you want to quarantine some data based on a given rule, the workaround is to create another table and put an expectation to drop all the good records and keep only the bad one

jeanchindeko
Автор

Great job as always Bryan, keep it up, you are helping us all!

VeroneLazio
Автор

2:40 It seems like Premium is required for most features now, as everything is based on Unity Catalog which in turn is a premium feature.

MariusS-hp
Автор

Really great content to understand in detail about how DLT works. Thanks @Bryan for your effort in making this video.

balanm
Автор

The new way is to use streaming or materialized view no more live table, also the implementation that I am trying to do with the cloud_files doesn’t seem to be working at all CREATE OR REPLACE MATERIALIZED VIEW mat_tst
AS
SELECT *
FROM cloud_files("/Volumes/main/bronze/csv",
"csv",
map('schema', 'ID INT, Name STRING, Shortcode STRING, Category STRING',
'header', 'true',
'mergeSchema', 'true'))

frag_it
Автор

Hey Bryan, Thanks For the video. Just curious, do we know the list of decorators which we can use in DLT pipelines. I looked into the documentation but was unable to find it

wrecker-XXL
Автор

Another awesome tutorial, thank you Bryan.

stu
Автор

hey bryan, great video, I have a quick quesiton, when you create a DLT for RAW, PREPARED and the last layer, that tables are created in the lakehous into BRONZE< SILVER AND GOLD?

ezequielchurches
Автор

Hi Bryan, Is it possible to use Standard cluster to create Delta live tables instead of creating new cluster every time ?

hariprasad-nr
Автор

Hi. Just wanted to make sure something. I am using Azure databricks where I already have two clusters in production. Now, if I want to create a DLT pipeline (assuming that's the only way to use Delta live tables ), would that create a new cluster/compute resource ?

JustBigdata
Автор

what I have observed, the materialized view is recomputing everything from scratch, what can we do to do incremental ingestion into the materialized view based on the group by clause if we provide.

ShubhamSingh-ovye
Автор

Thanks for the awesome video! A question if you could help: How to do CI/CD with delta live tables?

krishnakoirala
Автор

Really confused if i use DLT's for my project or old way of doing it for Medallion architecture.
Now i watching your video, that DLT's cost alot more than normal ingestion pyspark pipelines? :(

TheDataArchitect
Автор

Would it be possible to create unmanaged tables with a location in datalake using DLT pipelines ?

mateen
Автор

Hello Bryan Sir,
Thanks for your amazing videos.

IbrahimNagori-lc
Автор

is to possible to create tables under multiple schemas using a DLT pipeline. I have tried few approaches, but it looks to me that the DLT can only work with a single schema.

Satyajeet-tj
Автор

Nice info! Is is a bad design to have bronze, silver and gold layer in the same schema. I believe DLT doesn’t work with multiple schemas

MOHITJ
Автор

Hi, I am also trying to build a DLT pipeline manually, I have performed everything in the same way, but it shows "waiting for resources" for a very long time to me

shreyasd
Автор

Hi Bryan, I'm unable to import dlt module using import command
I also used magic command and other solutions from stackoverflow too
Can you help me to import dlt module

sumukhds