Airflow Tutorial: Automate SFTP Operations using Apache Airflow

preview_player
Показать описание
While not necessary, if you enjoyed this video, buying me a coffee is greatly appreciated!

I do so much work with SFTP servers that I have been looking for solutions to help automate some of this repetitive work. One possible solution that I have been looking into is Apache Airflow. In this video, I give a tour of Apache Airflow's features and how you can go about writing Airflow workflows (DAGs) that download and upload files from SFTP servers.

Code from this tutorial:

Introduction: 00:00
Installing Airflow: 00:55
First Airflow DAG/Workflow: 01:29
Configuring SFTP Connections: 03:25
Triggering the DAG/Workflow: 07:28
Problems with this Workflow: 10:19
Using Airflow Hooks: 11:53
Рекомендации по теме
Комментарии
Автор

Amazing video, just what I needed, thank you!

alecmather
Автор

EXCELLENT VIDEO !
I have a question. If I want to make AWS Managed Apache Airflow look at a certain location and retrieve the private key file, how would I do that? Looks like "private_file" key does not accept a AWS S3 path.

Any suggestions here would be very helpful

prasannasundarajan
Автор

Thank you, informative video. wish you had explained how to get to this airflow dashboard screen

eljangoolak
Автор

Priceless straightforward, easy to understand example. Question to 'big files' u said u would save file to temp database. The question is what exactly u mean by that. By how exactly?

CookieMonsteeerrr
Автор

@productivityforprogrammers can you pls give some more details on how you generated the host_key? I am using an SFTP server which requires a .ppk key.

prasannasundarajan
Автор

Thanks for the video, I wish you put the link for next video here too

RezaRahmati
Автор

sir, it is possible to move data from postgres with csv files into sftp server?
please answer if u know.. thankyou sir

djamier
Автор

for some reason, when I try to test my FTP connection in the connections UI of airflow, I get Errno 104 Connection reset by peers. what could be the possible problems of the same, please, I have been having this trouble for quite some time now

shivangikulshrestha
Автор

In your case How can I check for multiple input_file from the FTP server coming at the same time and how to download new files in the same folder.

huyle-veqi
Автор

I Have small doubt, the Host key is mandatory to add in an extra column? because i am not able to connect sftp

pdtalikoti
Автор

Hi regarding the key_file, is it the private key of the target server? I tried putting the exact path of my private key (of the airflow server) but I'm getting file does not exist

benjamincabalona
Автор

Thank you for the video, but after I follow this, I got some error "SSHException: Bad host key from server" Please explain the steps to solve this problem to me ( I used the official Docker Image)

miljang
Автор

ive got this error sir " FileNotFoundError: [Errno 2] No such file or directory: '~/.ssh/id_rsa' "
but when i tested connection in airflow UI its success. can u help me sir ?

djamier
Автор

Hello Brother.
No sensorsftp.
How do I set the path? Is it the path where the sensor will be looking?

MScFabianoBriao
Автор

Thank you for this very useful video. A similar request that I am having difficulty implementing is instead of SFTP I need to GET files for Sharepoint and then do a PUT into an AWS S3 bucket. Is this doable in AWS MWAA? Appreciate any help in this regard.

islauddinmohammed
Автор

Hi,
First of all thanks a lot for the video it is a great help during my internship.

I just don't understand what the output_file, in the process_file, refers to ...
Is it the path to my space or a space in Airflow ?

Could you enlighten me on this point ?

Thanks in advance.

yasminsahli
Автор

Is this same for Cloud Composer also ?

ayushmandloi
Автор

I have an issue with fernet key invalid token

FrancoisChannel
Автор

This is really good video, thanks you so much for that. I wanted to know if you know possibly, how can we write a sensor which is waiting for a new file in a known remote folder (so that means only filename is not known, as it can be dynamic). In your example, i understand the directory is dynamic, but the filename is constant.

imlazy