Why Choose Airflow and why not?
Author: Omid Vahdaty 26.5.2020
Top 5 Reasons to choose Apache Airflow
No Vendor lock and Cloud Agnostic
Tes, switching from cloud to cloud as a data engineer – is non trivial, as a data engineer I want to learn one tool, that connects to everything i need.
As a data engineer, i want to contribute from my experience to the airflow community by writing custom operators. using operator from the community is also super useful.
Easy to Scale, easy to get the performance you want of the ETL, no missing features, as everything’s be coded easily via python. Above all , you can generate highly complicated dependencies between each step in the ETL logic.
Imagine a cutting data department – the gap between data engineering and data science is huge. The only way to bridge that gap nowadays is via SQL and Python. Airflow as a Python based ETL tool is a natural choice. An Airflow cluster based on Dusk cluster can be used for both a DE and DS.
Top 3 Reasons Not to Choose Airflow
Sometimes, your organization can not afford letting you do the learning curve on company time. so there is readly no time to implement the tool properly. Airflow is more useful when you have a team of data engineers, and the amount of ETL is growing month over month, However, most SMB are not scaling their data teams that quickly so Airflow may be an overkill.
Time to market
Writing an ETL via airflow sometimes takes a bit longer that modern tools.
Debug is a challenge
Debugging an ETL via airflow is a a bit more challenging as errors are not clear.
Not everything is supported out of the box
Airflow connects well in GCP ecosystem. I.e there is an Airflow Operator for all the basics services in GCP . However in AWS the list of Airflow operator is shorter, you may need to rely on AWS Boto3 for the basics.
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,
feel free to contact me via LinkedIn: