airflow, AWS, Cloud SQL, Data Engineering, GCP

Airflow MySQL Integration – how to?

The default installation of Airflow come with SQLlite as backend. this mode does not allow concurrency in your DAG’s.

In this blog we will upgrade an Airflow vanilla installation to work with localExceutor and GCP Cloud SQL (MySQL).

  1. Stop Airflow and change the airflow configuration file: airflow.cfg to contain “LocalExecutor”, Note: SequentialExecutor is the default. 🙂
#stop server:  Get the PID of the service you want to stop 
ps -eaf | grep airflow
# Kill the process 
kill -9 {PID}
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
executor = LocalExecutor

2. If you are using cloud, setup AWS RDS Aurora (MySQL) or GCP CloudSQL with minimal resources. Point SQL Alchemy to MySQL (if using MySQL)

sql_alchemy_conn = mysql://{USERNAME}:{PASSWORD}@{MYSQL_HOST}:3306/airflow
  • before you continue – confirm the security group allow incoming network access to your DB from airflow machine (And your MySQL client)
  • make sure user/pass are created to match your needs. also set the password for the root user.
  • Noice the user/pass are in plain text. this is merely a temporary workaround

3. Setup MySQL (if using MySQL) , creating a new db called airflow and grant the above user permissions.

 CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci; 
 
 grant all on airflow.* TO ‘USERNAME'@'%' IDENTIFIED BY ‘{password}'; 
 

if You are using GCP Cloud SQL, You can use the GUI to create the DB. you can also do it from the airflow machine via installing a mysql client (assuming ubuntu18.04), Note I recommed using both way to connect to the MySQL DB as one is easy, and the other is to test network access from Airflow Cluster to Cloud SQL

sudo apt install -y mysql-client-core-5.7

mysql -u MyUser -h 35.192.167.163 -pMyPassword

if you are using GCP , you can connect via cloud shell and the following cli cmd:

gcloud sql connect airflow --user=root --quiet

4. Initiate the Airflow tables via

airflow initdb

#confirm there are no errors.if you are getting "explicit_defaults_for_timestamp" #error see the end of this document for a suggested solution how to resolve.

5. Start Airflow

airflow webserver -p 8080

  • Cant connect to MySQL/ unable to connect mysql – check network is allowed, check user/password. test user/pass via Cloud Shell – this will bypass network problem for u. and test network via MySQK CLI once user/pass are confirmed.
  • Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
  1. In the Google Cloud Platform Console, create a new GCP Console project, or open an existing project by selecting the project name. …see naming guidelines
  2. Open the instance and click Edit.
  3. Scroll down to the Flags section.
  4. To set a flag that has not been set on the instance before, click Add item, choose the flag “explicit_defaults_for_timestamp” from the drop-down menu, and set its value.
  5. Click Save to save your changes.
  6. Confirm your changes under Flags on the Overview page.
  • Notice: all DAGs will be turned off

1 thought on “Airflow MySQL Integration – how to?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s