AWS EMR
What is EMR?
Fully managed Hadoop cluster platform, that simplifies running big data frameworks.
EMR advantages:
EMR use cases:
Transformation of data and modeling.
EMR antipattern use cases:
- BI
- Operational Database.
Our EMR Blogs
Core Hadoop technologies
- Jupyter Demystified
- Installing an AWS EMR cluster tutorial
- Cherry pick source files in Hive external table example
- registering EMR master node as target to ALB via cloudFormation or CLI
- Bootstrapping EMR from 0900 to 1700 on each work day with AWS Cloud Formation and AWS Data Pipelines
- EMR and watch dog: service-nanny?
- How to increase disk space on master node root partition in EMR
- How to restart AWS EMR Hive Metastore
- AWS S3 caching while working with Hive spark SQL and External table | LLAP
- Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)
- How to connect via JDBC to Spark SQL EMR on AWS
- How to export data from Google Big Query into AWS S3 + EMR hive or AWS Athena
- Bootstrapping Oozie
Optional Web Client
EMR Realated Architectures Blogs
- AWS EMR Presto Demystified | Everything you wanted to know about Presto
- 39 Tips to reduce costs on AWS EMR
- AWS Big Data Demystified #2 | AWS Athena, Spectrum, EMR, Hive
- When should we use EMR and When should we use Redshift? EMR VS Redshift
- AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h
- How to work with maximize resource allocation and Spark dynamic allocation [ AWS EMR Spark ]
- Spark performance tuning demystified