The use case? we wanted to analyze the airflow mysql DB via BigQuery so we can created dashboard of cost per DAG.
In order to run a query from GCP BigQuery on a table in GCP Cloud SQL , you need to perform the following steps:
you must first create a new connection through Bigquery to the cloud SQL instance .
on your BigQuery console
- press “+ADD DATA” –> “create connection” –> MySQL
- connection ID –> take if from te Cloud SQL instance , it is called “Connection Name” and is of following format: MyProject:us-central1:my-cloud-sql-instance-name
Once the connection with the Cloud SQL instance has been created, you can create new queries in Bigquery, retrieving data from a database in the Cloud SQL instance .
Bear in mind that the “connection id” is obtained in the first step to be able to enter it in the query in the second step.
Example of federated Query (not as stated in the BQ documentations):
SELECT * FROM EXTERNAL_QUERY("<YOUR PROJECT ID>.<YOUR LOCATION>.<YOUR CONNECTION ID>", "SELECT * FROM DATASET.TABLE")
create or replace view `gap---all-sites-1245.DATA_LAKE.airflow_dag` as SELECT * FROM EXTERNAL_QUERY("myProject.us.connection-name", "SELECT * FROM airflow.dag")
notice the “Your connection ID” is the id you supplied while creating the external connection in BigQuery.
Common errors: connections
“Invalid table-valued function EXTERNAL_QUERY Failed to connect to MySQL database. Error: MysqlErrorCode(2013): Lost connection to MySQL server during query at”
Common errors: Regions
The BigQuery dataset and the Cloud SQL instance must be in the same region, or same location if the dataset is in a multi-region location such as US and EU. check you region is supported.
common error Public IP :
Cloud SQL must have a public IP
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me: