Parsing the last 4 days of google analytics table GA_sessions in google BigQuery

The problems with querying google BigQuery:

  1. The table is sharded not partitioned – this means each table has it owen table name e.g ga_sessions_20190202
  2. The GA sharded table also has some temporary data on a table called ga_sessions_intraday_20190505
  3. so when you select * from ga_sessions_* you will also get some unexpected ga_sessions_intraday_* shards

Below is example of querying the last 4 days in any GA_sessions table

select * from myProject.MyDataset.ga_sessions_*

Load CSV from AWS S3 into AWS RDS Aurora

I understand that you would like to connect to your Aurora instance from your EC2.

In order to achieve this, please make sure that the service is running using the following command: 

sudo service mysqld status

Alternatively, install mysql client directly from the mysql repositories ( Please note that these are mysql community repositories and are not maintained/supported by AWS.

Once the service is running you can use the following connection string:

mysql -h Cluster end point -P 3306 -u Your user -p 

Please, follow the link for further information while Connecting to an Amazon Aurora MySQL DB Cluster: 

Shortcut to Install mysql CLI:

sudo yum install mysql

 Connect to RDS, after confirming your Security group allows access from your ec2 to your RDS, example:

mysql -u MyUser -p myPassword -h MyClusterName

Create Role to allow aurora access to s3:

Set your cluster to use the above role (with access to s3)

assuming your DB is on private LAN, Setup VPC endpoint to s3 (always a good practice to setup and endpoint anyways):

Add to aurora parameter group – aurora_load_from_s3_role- the ARN of the role with s3 access, and reboot the Auroroa DB.

grant your user s3 access via:

GRANT LOAD FROM S3 ON *.* TO ‘user‘@’domain-or-ip-address

Create a table on mysql , example:

use myDatabase;

create table engine_triggers(
today_date date,
event_prob double

Load into examples (ignoring header or using quoted fields_ :

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers fields terminated by ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 lines;

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘”‘ ESCAPED BY ‘”‘ ;

Ignoring header and using quoted fields

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ 
INTO TABLE engine_triggers
fields TERMINATED BY ‘,’
IGNORE 1 lines;

You can automate it via crontab hourly runs as follows (notice the extra escape char before \n and inside enclosed by”:

0 * * * * mysql -u User -pPassword -hClusterDomainName -e “use myDatabase; truncate myDatabse.engine_triggers; LOAD DATA FROM S3 ‘s3://bucket/file.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘\”‘ LINES TERMINATED BY ‘\\n’ IGNORE 1 lines”

Deatilted manual:

and some documentation about mysql data types:

Running queries on RDS Aurora from AWS Lambda

You can find the relevant set of steps for accessing your Amazon Aurora instance using Lambda in the following documentation:

[+] Tutorial: Configuring a Lambda Function to Access Amazon RDS in an Amazon VPC –

I also carried out a test for connecting to my Aurora instance from Lambda. Following are the steps taken by me in order to achieve the same:

Create an Aurora Cluster and connect to the Writer instance using cluster endpoint. Create sample database and table. (Make sure you have correct set of source IP address given in the Security group of the instance for allowing successful connection. )

Now coming to creating a Lambda function to access the Aurora instance:

Creating Role

To start with, we first need to create an execution role that gives your lambda function permission to access AWS resources. 

Please follow the to create an execution role:

1. Open the roles page in the IAM console:
2. Choose Roles from the left dashboard and select Create role.
3. Under the tab “Choose the service that will use this role” select Lambda and then Next:Permissions
4. Search for “AWSLambdaVPCAccessExecutionRole”. Select this and then Next:Tags
5. Provide a Tag and then a Role Name (ex. lambda-vpc-role) and then Create Role.

The AWSLambdaVPCAccessExecutionRole has the permissions that the function needs to manage network connections to a VPC. 

Creating Lambda Function

Please follow the below steps to create a Lambda function:

1. Open the Lambda Management Console :
2. Choose Create a function
3. Choose In Author from scratch, and then do the following: 
    * In Name*, specify your Lambda function name.
    * In Runtime*, choose Python 2.7.
    * In Execution Role*, choose “Use an existing role”.
    * In Role name*, enter a name for your role which was previously created “lambda-vpc-role”.
4. Choose create function.
5. Once you have created the lambda function, navigate to the function page .
6. In the function page, Under Networks Section do the following.
    * In VPC, choose default VPC
    * In Subnets*, choose any two subnets
    * In Security Groups*, choose the default security group
7. Click on Save

Setting up Lambda Deployment Environment

Next you will need to set up a deployment environment to deploy a python code that connects to the RDS database.
To connect to a Aurora using Python you will need to import pymysql module. Hence we need to install dependencies with Pip and create a deployment package. In your local console please execute these commands in your local environment. 

1. Creating a local directory which will be the deployment package:
$ mkdir rds_lambda;

$ cd rds_lambda/

$ pwd

2. Install pymysql module 
$ pip install pymysql -t /Users/user/rds_lambda

By executing the above command you will install the pymysql module in your current directory

3. Next create a python file which contain the code to connect to the RDS instance:
$sudo nano

I have attached the file “” which has the  Python code to connect to the RDS instance.

4. Next we need to zip current directory and upload it to the lambda function.
$ zip -r `ls` 

The above command creates a zip file “” which we will need to upload to the lambda function.
Navigate to the newly created lambda function Console page :

1. In the Function Code section -> Code Entry Type -> From the drop down select upload a zip file
2. Browse the zip file from the local directory 
3. Next you in the Function Code Section you will have to change the Handler to pythonfilename.function (ex. connectdb.main).
4. Click Save.
5. Next you will need to Add  the security group of the Lambda Function in your RDS Security group.
6. After that test the connection, by creating a test event.

If you see that the execution successful then the connection has been made.

You may also go through the below video link which will give a detailed explanation on how to connect to an RDS instance using a lambda function

Followed by successfully establishing the connection, you can modify the python file to query databases inside the Aurora instance.

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…

Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS & GCP and Data Center infrastructure to answer the basic questions of anyone starting their way in the big data world.

how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORC,AVRO which technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL? GCS? Big Query? Data flow? Data Lab? tensor flow? how to handle streaming? how to manage costs? Performance tips? Security tip? Cloud best practices tips?

In this meetup we shall present lecturers working on several cloud vendors, various big data platforms such hadoop, Data warehourses , startups working on big data products. basically – if it is related to big data – this is THE meetup.

Some of our online materials (mixed content from several cloud vendor):

Website: (under construction)


Big Data Demystified

Tel Aviv-Yafo, IL
494 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Next Meetup

Big Data Demystified | From Redshift to SnowFlake

Sunday, May 12, 2019, 6:00 PM
23 Attending

Check out this Meetup Group →

AWS Big Data Demystified

Tel Aviv-Yafo, IL
635 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Check out this Meetup Group →

You tube channels:


Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects

Alluxio Demystified | Unify Data Analytics Any Stack Any Cloud

Personally, I have been waiting for over a year to host this lecture at our meetup. At the time in Walla News , I wanted to test drive their solution to accelerate Hive and spark SQL over s3 and external tables. if you are into caching, performance, and and unifying your multiple storage solutions : GCS, S3, etc, You might want to hear the wonderful lecturer Bin Fan, Phd , Founding Engineer and VP open Source at Alluxio.

This Post will be update soon more! stay tuned. for now, you are welcome to join our meetup.

Unify Data Analytics: Any Stack Any Cloud | Webinar | Big Data Demystified

Tuesday, Mar 19, 2019, 7:00 PM

Kikar Halehem
Kdoshei HaShoa St 63 Herzliya, IL

22 Members Went

**** This is a first webinar on this meetup, Please be patient**** The webinar will be broadcasted via Youtube : if you want to join and beome active in this webinar via hangout: if there is a problem join our meetup group for last minute …

Check out this Meetup →