Incrementally Load Files in Azure Data Factory by Looking Up Latest Modified Date in Destination

preview_player
Показать описание
This is a common business scenario, but it turns out that you have to do quite a bit of work in Azure Data factory to make it work. So the goal is to take a look at the destination folder, find the file with the latest modified date, and then use that date as the starting point for coming new files from the source folder. I did not come up with this approach by myself, however, unfortunately, I misplaced the link to the original post so I cannot properly credit the author.


The details are in the video, but at high levels the steps are the following:
1. Use Get Metadata activity to make a list of all files in the Destination folder
2. Use For Each activity to iterate this list and compare the modified date with the value stored in a variable
3. If the value is greater than that of the variable, update the variable with that new value
4. Use the variable in the Copy Activity’s Filter by Last Modified field to filter out all files that have already been copied

Рекомендации по теме
Комментарии
Автор

Thank you for this and thank you reminding that there is number of file limitation and sizes...really help a lot!

jeffersonbabuyo
Автор

Super helpful. I was struggling to load delta files in ADF using Last Modified Date. This helped me complete my project. Thank you so much and keep such videos coming!

yogitanesargi
Автор

It's very helpful, though you can't debug if you haven't 'Set fileName' in the If Conditions maybe new advancements in Azure but have sorted that. All in all much appreciated, this is the best and real example so far on incremental. Do you mind explain the source and sink on 'Copy Data Phase' since the two Get Metadata are having source and destination respectively.

OnlineForward
Автор

What did you put in the SetFilename avtivity?

koeld
Автор

Hi, great video. 2 questions. 1) you don't show what you value you set in the SetFileName set variable activity in the if condition, what is this? 2) you say the limitation of the number of files, we are looking at a large dataset in a blob storage, do you have any video/blog posts on doing something similar for large datasets? thanks!

timblack
Автор

Thank you for this video but have 2 questions: 1. It is not working for JSON file format and, 2. could you please share something on how we can do the same in case a large number of files, maybe more than 100 thousand in the blob storage.

vivekgarg
Автор

Hi, If I have folder inside foder and having files in that, when I am reading that lastmodified is throwing error as I can't pass subfolder name during source data set as we have done in this video(because we have ony one root folder), could you please help me how to fix that

lsantoshkumard
Автор

Thanks for great video, can you please tell what value have you specified in Setfinalname activity inside if condition?

nikitachaturvedi
Автор

I'm finding that after the Copy Data activity, the Last Modified of the Sink files is the date they were copied, so it doesn't correspond to the Source last modified. Is there a way to fix that?

jcbeck