Airflow Use Case: Improving success rate of API calls | Google Search Console
Author: Omid Vahdaty 3.4.2020
Something you ETL needs to call 3 party API’s and the success is not guaranteed, i.e calling the api too much in parallel – we result in failure.
For example, the API of Search Console has many limits and quates
Four types of API quotas in Google Search Console
- Short term – per query – if it consumes to much resources from the API – it fails – try again in 15 min.
2. long term – if it fails more than 2 times – you need to wait 24 hours as you many have exceeded DAILY Quota.
3. Quotas per time slots:
QPD – Quota per day
SQM – Quota per minute – 1200.
QPS – Quota per second per user – 20
QPS – Quota per second per all user – 50
Airflow options to slow down , retry , and control concurrency in airflow operators:
First thing to notice, default arguments:
default_dag_args = { 'start_date': yesterday, 'retry_exponential_backoff': True, 'max_retry_delay': datetime.timedelta(minutes=20), # If a task fails, retry it once after waiting at least 5 minutes 'retries': 1, 'concurrency':12, #no more than 4 times the cpu cores, no more than your API concurrency limits 'max_active_runs':2, #of same dag 'catchup':False, # usually good on my usecase 'retry_delay': datetime.timedelta(minutes=5), 'project_id': models.Variable.get('gcp_project') }
And you can also overwrite the parameters in the Operator:
run_report_remotly_status = BashOperator(task_id='run_report_remotly_'+temp_date,retries=2,retry_delay=datetime.timedelta(seconds=30),retry_exponential_backoff=True,max_retry_delay=datetime.timedelta(minutes=20),bash_command=bash_run_report_remotly_cmd,trigger_rule="all_done")
full example is committed here in our GITHUB.
To read more about the Search Console, read this search console blog.
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,
feel free to contact me via LinkedIn:
1 thought on “Airflow Use Case: Improving success rate of API calls | Google Search Console”