apache, architecture, AWS EMR, EMR, Hive, presto, Spark

AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h

This blog Assumes prior knowledge, this to help the reader design training program to newbies on AWS EMR Hadoop. Naturally, my big data perspective is applied here. This blog is FAR FROM BEING PERFECT.

Learn the following in rising order of importance (in my humble opinion).

Quick introduction to big data in 200 KM/h

Beyond the basics….

Hive vs presto Demystified

Hive Demystified

EMR Zeppelin & Zeppelin

EMR Yarn Demystified

EMR Spark Demystified

EMR Livy demystified

EMR Spark and Zeppelin demystified

Rstudio and SparkR demystified

EMR spark Application logging

EMR Monitoring Demystified | EMR Ganglia

EMR spark tuning demystified

EMR Oozie demystified (not common, use airflow instead)


I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:


Leave a Reply