Airflow MySQL Integration – how to?

Author: Omid Vahdaty 03.10.2020​

The default installation of Airflow come with SQLlite as backend. This mode does not allow concurrency in your DAG’s.

In this blog, we will upgrade an Airflow vanilla installation to work with localExceutor and GCP Cloud SQL (MySQL).

How to connect Airflow to MySQL:

  1. Stop Airflow and change the airflow configuration file: airflow.cfg to contain “LocalExecutor”, Note: SequentialExecutor is the default. 🙂

#stop server:  Get the PID of the service you want to stop 
ps -eaf | grep airflow
# Kill the process 
kill -9 {PID}

# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
executor = LocalExecutor

2. If you are using cloud, setup AWS RDS Aurora (MySQL) or GCP CloudSQL with minimal resources. Point SQL Alchemy to MySQL (if using MySQL)

sql_alchemy_conn = mysql://{USERNAME}:{PASSWORD}@{MYSQL_HOST}:3306/airflow

  • Make sure user/pass are created to match your needs. also set the password for the root user. make sure you also set the password of the root user.
  • Notice the user/pass are in plain text. This is merely a temporary workaround

3. This section is the less recommened approach to setup mysql in GCP. If you are using GCP cloud SQL, skip to t section 3.1. To Setup MySQL (if using MySQL traditional approach), creating a new db called airflow and grant the above user permissions.

CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci; 
CREATE USER 'airflow'@'34.68.35.27' IDENTIFIED BY 'airflow';
grant all on airflow.* TO 'airflow'@'34.68.35.27' IDENTIFIED BY 'airflow'; 

If You are using GCP Cloud SQL, You can use the GUI to create the DB. You can also do it from the airflow machine via installing a MySQL client (assuming ubuntu18.04), Note I recommend using both ways to connect to the MySQL DB as one is easy, and the other is to test network access from Airflow cluster to Cloud SQL

# AWS
sudo apt install -y mysql-client-core-5.7 #AWS
sudo apt-get install python-dev libmysqlclient-dev pip install MySQL-python

 

#GCP 
sudo apt install -y default-mysql-client #GCP debian
mysql -u airflow -h 31.2.3.4 -pairflow

3.1 If you are are using GCP CloudSQL console , you can do the Airflow DB setup as follows:

  • Create airflow user via the console (allow any host)
  • In the console, connect via cloud shell to the DB using airflow user and create airflow db: “CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
  • Set via the console: explicit_defaults_for_timestamp to on.

If you are using GCP , you can connect via cloud shell and the following cli cmd (remember to set the password for airflow user):

gcloud sql connect airflow --user=airflow --quiet

4. Initiate the Airflow tables via the below CMD, Notice – this CMD  is used only when setting up an environment.

airflow initdb

Confirm there are no errors. If you are getting “explicit_defaults_for_timestamp” error, see the end of this document for the resolution.

5. Start Airflow

airflow webserver -p 8080

Common issues / errors

 

  • Can’t connect to MySQL/ unable to connect mysql – Check network route is allowed, Check user/password. Test user/pass via Cloud Shell. This will bypass network problem for you and test network via MySQL CLI once user/pass are confirmed.

  • Global variable explicit_defaults_for_timestamp needs to be on (1) for MySQL.

  1. In the Google Cloud Platform Console, create a new GCP Console project, or open an existing project by selecting the project name. 
  2. Open the instance and click Edit.
  3. Scroll down to the Flags section.
  4. To set a flag that has not been set on the instance before, click Add item, choose the flag “explicit_defaults_for_timestamp” from the drop-down menu, and set its value.
  5. Click Save to save your changes.
  6. Confirm your changes under Flags on the Overview page.

  • Notice: all DAGs will be turned off

——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn:

1 thought on “Airflow MySQL Integration – how to?”

Leave a Reply