Data Warehouse Ingestion Patterns with Apache NiFi

preview_player
Показать описание
This video talks through the pros and cons of three patterns you can use in Apache NiFi to ingest data into a table created with the Iceberg format.

- 1st option: PutIceberg
Simply push data using the PutIceberg processor. Super efficient but really only does inserts of new data into the table. It may not be a fit in all cases.

- 2nd option: PutDatabaseRecord
Great option that is a bit more generic than the previous one if the destination is not an Iceberg formatted table. In this case the data is sent over JDBC. Great for small datasets but won't be super efficient for huge datasets.

- 3rd option: Staging area with external temporary tables
A bit more involved in terms of flow design but more reliable and very flexible while very efficient as it delegates most of the work to the query engine. In this case data is pushed into a staging area of the object store, you create an external table on top of it, then merge the data from that external table into your final table, and do some cleanup.

Thanks for watching the video! As always, feel free to ask comments and share your feedback. And let me know what you'd like to see for the next video!
Рекомендации по теме
Комментарии
Автор

Hi, and thanks for the video.
I have question through... would there be a way to handle transactions in a scenario where I'm upserting into multiple tables, and I'd like the whole process to succeed or fail ?
Coming from Talend, I usually have a pre-job that starts a transaction on a db connection, all "processors" will use the transaction, and in the post-job I will commit or rollback, depending on whether there is an error or not.

franckroutier
Автор

Thanks for sharing! Insightful content.
I am a starter and I am wondering whether Nifi is able to handle cross-team collaboration? if so, I would be glad if you can share some useful links.
At the same, I doubt if it is really a good choice for heavy ETL/ELT or even CDC? (even though it is possible to implement it)
I see it good only as a mediation and routing tool, am I mistaken?

Thank you for your feedback!

nasrinidhal