Airflow SequentialExecutor Installation manual and basic commands

Author: Omid Vahdaty 15.8.2018

I used the below command, it took me several attempts, so i list here the list of CMD’s that worked for me.


  1. Install a new machine via GCE or EC2, with minimal resource, preferably free tier. if you are using GCE , make sure Cloud API access scopes has api enabled for BQ and GCS. if you are using AWS make sure, the machine has role enabled with the required permissions. [ I highly recommend using disk size of at least 200GB as the airflow logs folder is quickly filled , causing the airflow to crush]
  2. Install git
sudo apt-get -y install git

3. Git clone big data demystified git

git clone https://github.com/omidvd79/Big_Data_Demystified.git

4. Copy the airflow installation script to home folder and run it

cd Big_Data_Demystified/
cd airflow/
cd setup/
cp * ~/
cd ~/
sudo sh Airflow_Ubuntu18_aws_signle_machine.sh
#this script will also work on GCP Debian OS

5. Create an optional logs folder in the home folder. and the dags folder.

mkdir gs_logs
mkdir -p airflow/dags

5. Start airflow for the first time by running initdb , webserver and scheduler:

# initialise the database , notice is is only used ONCE! on setup time 🙂
airflow initdb
# start the web server, default port is 8080
airflow webserver -p 8080
#start the scheduler
airflow scheduler

You can check our our sh script to start airflow, notice it was customized to our needs

sh start_airflow.sh

Common problem starting the web server:
https://stackoverflow.com/questions/41922412/error-with-gunicorn-server-start/51432068

To stop airflow

#stop server:  Get the PID of the service you want to stop 
ps -eaf | grep airflow
# Kill the process 
kill -9 {PID}
#or in one command (ubuntu):
pkill airflow

Go over the config file

nano airflow/airflow.cfg

Notice the LOGS & DAGS folder is located:

base_log_folder = /home/omid/airflow/logs
dags_folder = /home/omid/airflow/dags

Make sure HTTP 8080 is open on the machine via GCP/AWS. Instruction on GCE:

enter GCE and choose your instance
right click on 3 dots (on the right corner ,of the instance row on GCE)
View Network Details
edit
add your IP and remove 0.0.0.0/0 
change the port http port to 8080
Check connectivity(based on the External IP):
http://1.2.3.4:8080/admin/

on AWS EC2 machine

Change the ip/port on security groups of the instance on EC2

To avoid the problem of permission for different linux users, You might want to consider GCS fuse on GCP machines, i assume dags are located on bucket name below. It will also decouple your dags from the instance, and generally speaking will make the process of uploading new dags easy.

gs://airflow-fuse-bucket

 

After Airflow web is up, Don’t forget to add the GCP related variables, in Airflow–>variables

gce_zone	us-central1-a	
gcp_project	myProjectID	
gcs_bucket	gs://airflow_gcs_bucket

Another thing to remember in GCP , you need to specify default project in bigquery connection. Airflow web >> Admin >> Connections >> bigquery_default >> Project Id  , add the value of your projectID

myProjectID

Airflow user for login

airflow users create \
> --username admin \
> --firstname FIRST_NAME \
> --lastname LAST_NAME \
> --role Admin \
> --email admin@example.org

 

Advanced command to start / stop Airflow services

  1. Start Web Server
    nohup airflow webserver $* >> ~/airflow/logs/webserver.logs &
  2. Start Celery Workers
    nohup airflow worker $* >> ~/airflow/logs/worker.logs &
  3. Start Scheduler
    nohup airflow scheduler >> ~/airflow/logs/scheduler.logs &
  4. Navigate to the Airflow UI
  5. Start Flower (Optional)
    • Flower is a web UI built on top of Celery, to monitor your workers.
    • nohup airflow flower >> ~/airflow/logs/flower.logs &
  6. Navigate to the Flower UI (Optional)

Example path of Airflow Dags folder:

/usr/local/lib/python3.6/dist-packages/airflow/example_dags/

More Example could be found on another blog of this website:

https://big-data-demystified.ninja/2019/02/18/air-flow-example-of-job-data-composer-gcp/

 

Another good manual:

http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn:

1 thought on “Airflow SequentialExecutor Installation manual and basic commands”

Leave a Reply