AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Lambda, AWS Redshift, Hive, meetup, Uncategorised

200KM/h overview on Big Data in AWS | Part 2

in this lecture we are going to cover AWS Big Data PaaS technologies used to model and visualize data using a suggested architecture and some basic big data architecture rule of thumbs.

For more meetups:
https://www.meetup.com/Big-Data-Demystified/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Lambda, AWS Redshift, Hive

200KM/h overview on Big Data in AWS | Part 1

in this lecture we are going to cover AWS Big Data PaaS technologies used to ingest and transform data. Moreover, we are going to demonstrate a business use case, suggested architecture, some basic big data architecture rule of thumbs.

For more meetups:
https://www.meetup.com/Big-Data-Demystified/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS, AWS Aurora

AWS Aurora | MySQL Cheat Sheet

show databases;
show tables;
connect MyDatabaseName;

Check Tables size

SELECT       table_schema as `Database`,       table_name AS `Table`,       round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB`  FROM information_schema.TABLES  ORDER BY (data_length + index_length) DESC;
SHOW FULL PROCESSLIST;

SHOW VARIABLES LIKE "general_log%";

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' AND TIME >= 5;

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' AND INFO LIKE '%UPDATE %';

Query log mysql

https://tableplus.io/blog/2018/10/how-to-show-queries-log-in-mysql.html

Audit aurora logs in athena

https://aws.amazon.com/blogs/database/audit-amazon-aurora-database-logs-for-connections-query-patterns-and-more-using-amazon-athena-and-amazon-quicksight/

https://console.aws.amazon.com/rds/home?region=us-east-1#performance-insights:

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS Aurora

Load CSV from AWS S3 into AWS RDS Aurora

I understand that you would like to connect to your Aurora instance from your EC2.

In order to achieve this, please make sure that the service is running using the following command: 

sudo service mysqld status

Alternatively, install mysql client directly from the mysql repositories (https://dev.mysql.com/downloads/). Please note that these are mysql community repositories and are not maintained/supported by AWS.  
https://dev.mysql.com/doc/refman/5.7/en/linux-installation-yum-repo.html

Once the service is running you can use the following connection string:

mysql -h Cluster end point -P 3306 -u Your user -p 

Please, follow the link for further information while Connecting to an Amazon Aurora MySQL DB Cluster: 
[+]https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Connecting.html

Shortcut to Install mysql CLI:

sudo yum install mysql

 Connect to RDS, after confirming your Security group allows access from your ec2 to your RDS, example:

mysql -u MyUser -p myPassword -h MyClusterName

Create Role to allow aurora access to s3:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.IAM.CreateRole.html

Set your cluster to use the above role (with access to s3)

assuming your DB is on private LAN, Setup VPC endpoint to s3 (always a good practice to setup and endpoint anyways):

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.Network.html

Add to aurora parameter group – aurora_load_from_s3_role- the ARN of the role with s3 access, and reboot the Auroroa DB.

grant your user s3 access via:

GRANT LOAD FROM S3 ON *.* TO ‘user‘@’domain-or-ip-address

Create a table on mysql , example:

use myDatabase;

create table engine_triggers(
today_date date,
event_prob double
);

Load into examples (ignoring header or using quoted fields_ :

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers fields terminated by ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 lines;

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘”‘ ESCAPED BY ‘”‘ ;

Ignoring header and using quoted fields

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ 
INTO TABLE engine_triggers
fields TERMINATED BY ‘,’
ENCLOSED BY ‘”‘ 
LINES TERMINATED BY ‘\n’
IGNORE 1 lines;

You can automate it via crontab hourly runs as follows (notice the extra escape char before \n and inside enclosed by”:

0 * * * * mysql -u User -pPassword -hClusterDomainName -e “use myDatabase; truncate myDatabse.engine_triggers; LOAD DATA FROM S3 ‘s3://bucket/file.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘\”‘ LINES TERMINATED BY ‘\\n’ IGNORE 1 lines”

Deatilted manual:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html

and some documentation about mysql data types:

https://www.w3schools.com/sql/sql_datatypes.asp

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS Aurora, AWS Lambda

Running queries on RDS Aurora from AWS Lambda

You can find the relevant set of steps for accessing your Amazon Aurora instance using Lambda in the following documentation:

[+] Tutorial: Configuring a Lambda Function to Access Amazon RDS in an Amazon VPC – https://docs.aws.amazon.com/lambda/latest/dg/vpc-rds.html

I also carried out a test for connecting to my Aurora instance from Lambda. Following are the steps taken by me in order to achieve the same:

Create an Aurora Cluster and connect to the Writer instance using cluster endpoint. Create sample database and table. (Make sure you have correct set of source IP address given in the Security group of the instance for allowing successful connection. )

Now coming to creating a Lambda function to access the Aurora instance:


Creating Role

To start with, we first need to create an execution role that gives your lambda function permission to access AWS resources. 

Please follow the to create an execution role:

1. Open the roles page in the IAM console: https://console.aws.amazon.com/iam/home#/role
2. Choose Roles from the left dashboard and select Create role.
3. Under the tab “Choose the service that will use this role” select Lambda and then Next:Permissions
4. Search for “AWSLambdaVPCAccessExecutionRole”. Select this and then Next:Tags
5. Provide a Tag and then a Role Name (ex. lambda-vpc-role) and then Create Role.

The AWSLambdaVPCAccessExecutionRole has the permissions that the function needs to manage network connections to a VPC. 


Creating Lambda Function

Please follow the below steps to create a Lambda function:

1. Open the Lambda Management Console : https://console.aws.amazon.com/lambda
2. Choose Create a function
3. Choose In Author from scratch, and then do the following: 
    * In Name*, specify your Lambda function name.
    * In Runtime*, choose Python 2.7.
    * In Execution Role*, choose “Use an existing role”.
    * In Role name*, enter a name for your role which was previously created “lambda-vpc-role”.
4. Choose create function.
5. Once you have created the lambda function, navigate to the function page .
6. In the function page, Under Networks Section do the following.
    * In VPC, choose default VPC
    * In Subnets*, choose any two subnets
    * In Security Groups*, choose the default security group
7. Click on Save

Setting up Lambda Deployment Environment

Next you will need to set up a deployment environment to deploy a python code that connects to the RDS database.
To connect to a Aurora using Python you will need to import pymysql module. Hence we need to install dependencies with Pip and create a deployment package. In your local console please execute these commands in your local environment. 

1. Creating a local directory which will be the deployment package:
$ mkdir rds_lambda;

$ cd rds_lambda/

$ pwd
/Users/user/rds_lambda

2. Install pymysql module 
$ pip install pymysql -t /Users/user/rds_lambda

By executing the above command you will install the pymysql module in your current directory

3. Next create a python file which contain the code to connect to the RDS instance:
$sudo nano connectdb.py

I have attached the file “connectdb.py” which has the  Python code to connect to the RDS instance.

4. Next we need to zip current directory and upload it to the lambda function.
$ zip -r rds_lambda.zip `ls` 

The above command creates a zip file “rds_lambda.zip” which we will need to upload to the lambda function.
Navigate to the newly created lambda function Console page :

1. In the Function Code section -> Code Entry Type -> From the drop down select upload a zip file
2. Browse the zip file from the local directory 
3. Next you in the Function Code Section you will have to change the Handler to pythonfilename.function (ex. connectdb.main).
4. Click Save.
5. Next you will need to Add  the security group of the Lambda Function in your RDS Security group.
6. After that test the connection, by creating a test event.

If you see that the execution successful then the connection has been made.

You may also go through the below video link which will give a detailed explanation on how to connect to an RDS instance using a lambda function
[+]https://www.youtube.com/watch?v=-CoL5oN1RzQ&vl=en

Followed by successfully establishing the connection, you can modify the python file to query databases inside the Aurora instance.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/