Big Query, GCP

Parsing the last 4 days of google analytics table GA_sessions in google BigQuery

The problems with querying google BigQuery:

  1. The table is sharded not partitioned – this means each table has it owen table name e.g ga_sessions_20190202
  2. The GA sharded table also has some temporary data on a table called ga_sessions_intraday_20190505
  3. so when you select * from ga_sessions_* you will also get some unexpected ga_sessions_intraday_* shards

Below is example of querying the last 4 days in any GA_sessions table

select * from myProject.MyDataset.ga_sessions_*
where TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) OR (_TABLE_SUFFIX = CONCAT('intraday',CAST(FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY)) AS string) ) OR TABLE_SUFFIX = CONCAT('intraday',CAST(FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AS string) ))

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Leave a Reply