How to debug BigQuery query failure in Airflow DAG?

Authors: Omid Vahdaty 3.11.2019

The problem is that sometimes, the complete list of errors in not presented in Airflow.

The log only shows the following error:

Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'}. The job was: {'kind': 'bigquery#job', 'etag': 'ldp8TiJ9Ki0q3IiaMV8p5g==', 'id': 'MyProjectName:US.job_R2ig2EmIBy0-wD76zcNOCcuuyVX4',

The trick is to understand there are more errors, by running in the CMD:

 bq show -j US.job_R2ig2EmIBy0-wD76zcNOCcuuyVX4

This will share the list of failure details:

Failure details:
 - gs://myBucket/reviews_utf8/c_201804.csv: Error while reading data,
   error message: CSV table encountered too many errors, giving up.
   Rows: 1; errors: 1. Please look into the errors[] collection for
   more details.
 - gs://myBucket/reviews_utf8/c_201801.csv: Error while reading
   data, error message: Could not parse '3.9.58' as double for field
   App_Version_Name (position 2) starting at location 342
 - gs://myBucket/reviews_utf8/c_201804.csv: Error while reading data,
   error message: Could not parse '2.0.1' as double for field
   App_Version_Name (position 2) starting at location 342

From there – debugging the problem was simply a matter of understanding there is a mismatch between the value in the csv and the data type of the table.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, 

feel free to contact me via LinkedIn:

Leave a Reply