Airflow SequentialExecutor Installation manual and basic commands
Author: Omid Vahdaty 15.8.2018
I used the below command, it took me several attempts, so i list here the list of CMD’s that worked for me.
- Install a new machine via GCE or EC2, with minimal resource, preferably free tier. if you are using GCE , make sure Cloud API access scopes has api enabled for BQ and GCS. if you are using AWS make sure, the machine has role enabled with the required permissions. [ I highly recommend using disk size of at least 200GB as the airflow logs folder is quickly filled , causing the airflow to crush]
- Install git
sudo apt-get -y install git
3. Git clone big data demystified git
git clone https://github.com/omidvd79/Big_Data_Demystified.git
4. Copy the airflow installation script to home folder and run it
cd Big_Data_Demystified/ cd airflow/ cd setup/ cp * ~/ cd ~/ sudo sh Airflow_Ubuntu18_aws_signle_machine.sh #this script will also work on GCP Debian OS
5. Create an optional logs folder in the home folder. and the dags folder.
mkdir gs_logs mkdir -p airflow/dags
5. Start airflow for the first time by running initdb , webserver and scheduler:
# initialise the database , notice is is only used ONCE! on setup time 🙂 airflow initdb # start the web server, default port is 8080 airflow webserver -p 8080 #start the scheduler airflow scheduler
You can check our our sh script to start airflow, notice it was customized to our needs
sh start_airflow.sh
Common problem starting the web server:
https://stackoverflow.com/questions/41922412/error-with-gunicorn-server-start/51432068
To stop airflow
#stop server: Get the PID of the service you want to stop ps -eaf | grep airflow # Kill the process kill -9 {PID} #or in one command (ubuntu): pkill airflow
Go over the config file
nano airflow/airflow.cfg
Notice the LOGS & DAGS folder is located:
base_log_folder = /home/omid/airflow/logs dags_folder = /home/omid/airflow/dags
Make sure HTTP 8080 is open on the machine via GCP/AWS. Instruction on GCE:
enter GCE and choose your instance right click on 3 dots (on the right corner ,of the instance row on GCE) View Network Details edit add your IP and remove 0.0.0.0/0 change the port http port to 8080 Check connectivity(based on the External IP): http://1.2.3.4:8080/admin/
on AWS EC2 machine
Change the ip/port on security groups of the instance on EC2
To avoid the problem of permission for different linux users, You might want to consider GCS fuse on GCP machines, i assume dags are located on bucket name below. It will also decouple your dags from the instance, and generally speaking will make the process of uploading new dags easy.
gs://airflow-fuse-bucket
Â
After Airflow web is up, Don’t forget to add the GCP related variables, in Airflow–>variables
gce_zone us-central1-a gcp_project myProjectID gcs_bucket gs://airflow_gcs_bucket
Another thing to remember in GCP , you need to specify default project in bigquery connection. Airflow web >> Admin >> Connections >> bigquery_default >> Project Id  , add the value of your projectID
myProjectID
Airflow user for login
airflow users create \
> --username admin \
> --firstname FIRST_NAME \
> --lastname LAST_NAME \
> --role Admin \
> --email admin@example.org
Â
Advanced command to start / stop Airflow services
- Start Web Server
nohup airflow webserver $* >> ~/airflow/logs/webserver.logs & - Start Celery Workers
nohup airflow worker $* >> ~/airflow/logs/worker.logs & - Start Scheduler
nohup airflow scheduler >> ~/airflow/logs/scheduler.logs & - Navigate to the Airflow UI
- http://{HOSTNAME}:8080/admin/
- Start Flower (Optional)
- Flower is a web UI built on top of Celery, to monitor your workers.
- nohup airflow flower >> ~/airflow/logs/flower.logs &
- Navigate to the Flower UI (Optional)
- http://{HOSTNAME}:5555/
Example path of Airflow Dags folder:
/usr/local/lib/python3.6/dist-packages/airflow/example_dags/
More Example could be found on another blog of this website:
https://big-data-demystified.ninja/2019/02/18/air-flow-example-of-job-data-composer-gcp/
Â
Another good manual:
http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,
feel free to contact me via LinkedIn:
1 thought on “Airflow SequentialExecutor Installation manual and basic commands”