How to debug BigQuery query failure in Airflow DAG?
Authors: Omid Vahdaty 3.11.2019
The problem is that sometimes, the complete list of errors in not presented in Airflow.
The log only shows the following error:
Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'}. The job was: {'kind': 'bigquery#job', 'etag': 'ldp8TiJ9Ki0q3IiaMV8p5g==', 'id': 'MyProjectName:US.job_R2ig2EmIBy0-wD76zcNOCcuuyVX4',
The trick is to understand there are more errors, by running in the CMD:
bq show -j US.job_R2ig2EmIBy0-wD76zcNOCcuuyVX4
This will share the list of failure details:
Failure details: - gs://myBucket/reviews_utf8/c_201804.csv: Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details. - gs://myBucket/reviews_utf8/c_201801.csv: Error while reading data, error message: Could not parse '3.9.58' as double for field App_Version_Name (position 2) starting at location 342 - gs://myBucket/reviews_utf8/c_201804.csv: Error while reading data, error message: Could not parse '2.0.1' as double for field App_Version_Name (position 2) starting at location 342
From there – debugging the problem was simply a matter of understanding there is a mismatch between the value in the csv and the data type of the table.
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,
feel free to contact me via LinkedIn: