This blog Assumes prior knowledge, this to help the reader design training program to newbies on AWS EMR Hadoop. Naturally, my big data perspective is applied here. This blog is FAR FROM BEING PERFECT.
Learn the following in rising order of importance (in my humble opinion).
Quick introduction to big data in 200 KM/h
Beyond the basics….
Hive vs presto Demystified
Hive Demystified
EMR Zeppelin & Zeppelin
EMR Yarn Demystified
EMR Spark Demystified
EMR Livy demystified
EMR Spark and Zeppelin demystified
Rstudio and SparkR demystified
EMR spark Application logging
EMR Monitoring Demystified | EMR Ganglia
EMR spark tuning demystified
EMR Oozie demystified (not common, use airflow instead)
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me: