airflow, AWS, Data Engineering, GCP

Airflow Installation manual and workflow example and basic commands

I used the below command, it took me several attempts, so i list here the list of CMD’s that worked for me.

Prerequisites

sudo apt-get update –fix-missing

      sudo apt-get -y install build-essential autoconf libtool pkg-config python-opengl python-imaging python-pyrex python-pyside.qtopengl idle-python2.7 qt4-dev-tools qt4-designer libqtgui4 libqtcore4 libqt4-xml libqt4-test libqt4-script libqt4-network libqt4-dbus python-qt4 python-qt4-gl libgle3 python-dev

     sudo apt-get -y install build-essential autoconf libtool pkg-config python-opengl python-imaging python-pyrex python-pyside.qtopengl idle-python2.7 qt4-dev-tools qt4-designer libqtgui4 libqtcore4 libqt4-xml libqt4-test libqt4-script libqt4-network libqt4-dbus python-qt4 python-qt4-gl libgle3 python-dev

     sudo apt-get -y install python-setuptools python-dev build-essential


sudo apt install -y python3-pip

 sudo pip3 install apache-airflow


pip3 install pystan
sudo apt install -y libmysqlclient-dev 

sudo apt install -y python-boto3

sudo -H pip3 install apache-airflow[all_dbs]

sudo -H pip3 install apache-airflow[devel]

sudo pipinstall apache-airflow[all]




export AIRFLOW_HOME=~/airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

#start the scheduler
airflow scheduler

Common problem starting the web server:
https://stackoverflow.com/questions/41922412/error-with-gunicorn-server-start/51432068

#stop server:  Get the PID of the service you want to stop 
ps -eaf | grep airflow
# Kill the process 
kill -9 {PID}

#or in one command (ubuntu):
pkill airflow

Advanced command to start / stop Airflow services

  1. Start Web Server
    nohup airflow webserver $* >> ~/airflow/logs/webserver.logs &
  2. Start Celery Workers
    nohup airflow worker $* >> ~/airflow/logs/worker.logs &
  3. Start Scheduler
    nohup airflow scheduler >> ~/airflow/logs/scheduler.logs &
  4. Navigate to the Airflow UI
  5. Start Flower (Optional)
    • Flower is a web UI built on top of Celery, to monitor your workers.
    • nohup airflow flower >> ~/airflow/logs/flower.logs &
  6. Navigate to the Flower UI (Optional)

Example path of Airflow Dags folder:

/usr/local/lib/python3.6/dist-packages/airflow/example_dags/

More Example could be found on another blog of this website:

https://big-data-demystified.ninja/2019/02/18/air-flow-example-of-job-data-composer-gcp/

Another good manual:

http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/

1 thought on “Airflow Installation manual and workflow example and basic commands”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s