architecture, AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Redshift, AWS S3, EMR, SageMaker, Security

AWS Demystified – Comprehensive training program suggestion for newbies in 200KM/h

This blog Assume prior knowledge, this to help the reader design training program to newbies on AWS. Naturally, my big data perspective is applied here.

Learn the following in rising order of importance (in my humble opinion).

General lightweight introduction to AWS & AWS Big Data :

  1. Create AWS user on AWS account

Start with this. get the obvious out of the way.

  1. AWS S3 (GCS), AWS S3 cli

Be sure to understand the basics, upload, download, copy, move, rsnc from both the GUI and AWS CLI. only then go to other advanced features such as life cycle policy, storage tiers, encyrptions etc.

  1. AWS EC2(Elastic Compute), how to create a machine, how to connect via ssh

Be sure to use T instance to play a round, choose amazon linux or ubuntu. Notice the different OS users name required to ssh to each machine.

Be sure to understand what is

  1. SSH
  2. SSH tunnel
  1. AWS security groups how to add ip and port

Without this section you wont be to access web/ssh machines.

  1. AWS VPC (virtual private network) , only if you feel comfortable around network architecture, otherwise skipt this topic.
  1. AWS RDS (mySQL,aurora)

create Mysql , login, create table, insert data from S3, export data from s3.

understand the difference between AWS RDS aurora and AWS RDS mysql

  1. AWS Redshift learn how to create cluster connect to cluster, and run query , understand the basic architecture.
  2. AWS Athena, how to create external table, how to insert data, partitions, MSCK repair table, parquet, AVRO, querying nested data.
  1. AWS Glue

be sure to understand the 2 role of AWS glue: shared metastore and auto ETL features.

  1. AWS Kinesis, Stream, Firehose, Analytics, you need to understand messaging and streaming. I covered off topic subject here such as flume and kafka.
  1. AWS EMR – (need to talk about with omid).

This is HARD CORE material. highly advanced materials for enterprise grade lectures.

Be sure to understand when to use Hive and when to use SparkSQL and when to use Spark Core. Moreover, the difference between the different node: core node, task node, master node.

  1. AWS IAM, groups, user, role based architecture, encryption at rest, at motion, based policy, identity based policy. By now you should have a basic understanding of what is IAM about. Identity, Authentication, authorization. although, there are many fine grain security issues you need to understand. At the very minimum be sure to understand what is the difference between a role and user, how to write a custom policy, and what is resource based policy vs identity based policy.
  1. AWS Cloud best practices, light wight
  1. AWS ELB, ALB, Auto scaling. [Advanced]
  1. AWS Route53, TBD
  2. AWS security
  1. AWS Lambda
  2. AWS Sage Maker

TBD subject:

API Gateway

AWS cognito.


——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Lambda, AWS Redshift, Hive, meetup, Uncategorised

200KM/h overview on Big Data in AWS | Part 2

in this lecture we are going to cover AWS Big Data PaaS technologies used to model and visualize data using a suggested architecture and some basic big data architecture rule of thumbs.

For more meetups:
https://www.meetup.com/Big-Data-Demystified/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Lambda, AWS Redshift, Hive

200KM/h overview on Big Data in AWS | Part 1

in this lecture we are going to cover AWS Big Data PaaS technologies used to ingest and transform data. Moreover, we are going to demonstrate a business use case, suggested architecture, some basic big data architecture rule of thumbs.

For more meetups:
https://www.meetup.com/Big-Data-Demystified/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS, AWS Aurora

AWS Aurora | MySQL Cheat Sheet

show databases;
show tables;
connect MyDatabaseName;

Check Tables size

SELECT       table_schema as `Database`,       table_name AS `Table`,       round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB`  FROM information_schema.TABLES  ORDER BY (data_length + index_length) DESC;
SHOW FULL PROCESSLIST;

SHOW VARIABLES LIKE "general_log%";

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' AND TIME >= 5;

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' AND INFO LIKE '%UPDATE %';

MySQL Table with Identity (auto increment)

CREATE TABLE study
( study_id INT(11) NOT NULL AUTO_INCREMENT,
  study_col VARCHAR(255) NOT NULL,
  CONSTRAINT study_pk PRIMARY KEY (study_id)
);

Insert into table with identity

INSERT INTO study(study_col)  VALUES ( 'new_study' );

MySQL Client install and running from CLI – user – password – database

sudo apt install -y default-mysql-client

mysql -u user -h 1.2.3.4 -pPassword someDB -e "select * from t"

Suppressing and removing the headers and lines in MySQL cli results set (notice the -sN, -s, -N argument)

mysql -N -u airflow -h 1.2.3.4
mysql -s -u airflow -h 1.2.3.4 -pairflow airflow  -e "select study_id from study where study='str4';"
mysql -sN -u airflow -h 1.2.3.4 -pairflow airflow  -e "select study_id from study where study='str4';"

change the encoding of MySQL table

ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;

Check the encoding of a MySQL table

SHOW CREATE TABLE myTable;

Convert encoding to MySQL column

SELECT column1, CONVERT(column2 USING utf8)
FROM my_table 
WHERE my_condition;

Query log mysql

https://tableplus.io/blog/2018/10/how-to-show-queries-log-in-mysql.html

Audit aurora logs in athena

https://aws.amazon.com/blogs/database/audit-amazon-aurora-database-logs-for-connections-query-patterns-and-more-using-amazon-athena-and-amazon-quicksight/

https://console.aws.amazon.com/rds/home?region=us-east-1#performance-insights:

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS Aurora

Load CSV from AWS S3 into AWS RDS Aurora

I understand that you would like to connect to your Aurora instance from your EC2.

In order to achieve this, please make sure that the service is running using the following command: 

sudo service mysqld status

Alternatively, install mysql client directly from the mysql repositories (https://dev.mysql.com/downloads/). Please note that these are mysql community repositories and are not maintained/supported by AWS.  
https://dev.mysql.com/doc/refman/5.7/en/linux-installation-yum-repo.html

Once the service is running you can use the following connection string:

mysql -h Cluster end point -P 3306 -u Your user -p 

Please, follow the link for further information while Connecting to an Amazon Aurora MySQL DB Cluster: 
[+]https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Connecting.html

Shortcut to Install mysql CLI:

sudo yum install mysql

 Connect to RDS, after confirming your Security group allows access from your ec2 to your RDS, example:

mysql -u MyUser -p myPassword -h MyClusterName

Create Role to allow aurora access to s3:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.IAM.CreateRole.html

Set your cluster to use the above role (with access to s3)

assuming your DB is on private LAN, Setup VPC endpoint to s3 (always a good practice to setup and endpoint anyways):

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.Network.html

Add to aurora parameter group – aurora_load_from_s3_role- the ARN of the role with s3 access, and reboot the Auroroa DB.

grant your user s3 access via:

GRANT LOAD FROM S3 ON *.* TO ‘user‘@’domain-or-ip-address

Create a table on mysql , example:

use myDatabase;

create table engine_triggers(
today_date date,
event_prob double
);

Load into examples (ignoring header or using quoted fields_ :

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers fields terminated by ‘,’ LINES TERMINATED BY ‘\n’ IGNORE 1 lines;

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘”‘ ESCAPED BY ‘”‘ ;

Ignoring header and using quoted fields

LOAD DATA FROM S3 ‘s3://bucket/sample_triggers.csv’ 
INTO TABLE engine_triggers
fields TERMINATED BY ‘,’
ENCLOSED BY ‘”‘ 
LINES TERMINATED BY ‘\n’
IGNORE 1 lines;

You can automate it via crontab hourly runs as follows (notice the extra escape char before \n and inside enclosed by”:

0 * * * * mysql -u User -pPassword -hClusterDomainName -e “use myDatabase; truncate myDatabse.engine_triggers; LOAD DATA FROM S3 ‘s3://bucket/file.csv’ INTO TABLE engine_triggers   fields TERMINATED BY ‘,’ ENCLOSED BY ‘\”‘ LINES TERMINATED BY ‘\\n’ IGNORE 1 lines”

Deatilted manual:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html

and some documentation about mysql data types:

https://www.w3schools.com/sql/sql_datatypes.asp

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/