Airflow Use Case: Improving success rate of API calls | Google Search Console

Author: Omid Vahdaty 3.4.2020

Something you ETL needs to call 3 party API’s and the success is not guaranteed, i.e calling the api too much in parallel – we result in failure.

For example, the API of Search Console has many limits and quates

Four types of API quotas in Google Search Console

  1. Short term – per query – if it consumes to much resources from the API – it fails – try again in 15 min.

2. long term – if it fails more than 2 times – you need to wait 24 hours as you many have exceeded DAILY Quota.

3. Quotas per time slots:

QPD – Quota per day

SQM – Quota per minute – 1200.

QPS – Quota per second per user – 20

QPS – Quota per second per all user – 50

Airflow options to slow down , retry , and control concurrency in airflow operators:

First thing to notice, default arguments:

default_dag_args = {
    'start_date': yesterday,
    'retry_exponential_backoff': True,
    'max_retry_delay': datetime.timedelta(minutes=20),
    # If a task fails, retry it once after waiting at least 5 minutes
    'retries': 1,
    'concurrency':12, #no more than 4 times the cpu cores, no more than your API concurrency limits
    'max_active_runs':2, #of same dag
    'catchup':False, # usually good on my usecase
    'retry_delay': datetime.timedelta(minutes=5),
    'project_id': models.Variable.get('gcp_project')
}

And you can also overwrite the parameters in the Operator:

run_report_remotly_status = BashOperator(task_id='run_report_remotly_'+temp_date,retries=2,retry_delay=datetime.timedelta(seconds=30),retry_exponential_backoff=True,max_retry_delay=datetime.timedelta(minutes=20),bash_command=bash_run_report_remotly_cmd,trigger_rule="all_done")

full example is committed here in our GITHUB.

To read more about the Search Console, read this search console blog.
——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn:

1 thought on “Airflow Use Case: Improving success rate of API calls | Google Search Console”

  1. Pingback: Anonymous

Leave a Reply

Discover more from Big Data Demystified

Subscribe now to keep reading and get access to the full archive.

Continue reading