When should we use EMR and When should we use Redshift? EMR VS Redshift

Use Redshift when

  1. traditional data warehouse
  2. When you need the data relatively hot for analytics such as BI
  3.  when there is no data engineering team
  4. When you require joins
  5. when u need a cluster 24X7
  6. when you data type are simple
  7. when no nested jsons
  8. peta scale database
  9. when you want analize massive amount of data (spectrum)
  10. when u need update/delete
  11. when you require and ACID DBMS

Use EMR (SparkSQL, Presto, hive) when

  1. When you need a transient cluster
  2. when elasticity is important (auto scaling on tasks)
  3. when cost is important: spots
  4. until a few hundred TB’s
  5. when you want to separate compute and storage (external table + task node + auto scaling)
  6. when you require more flexibility
    1. complex partitions + dynamic partitioning + insert overwrite
    2. complex data type
      1. structs
      2. arrays <–> nested json
    3. orchestration built in
    4. notebook built in – mix code with SQL


Please check below Redshift specific faq: 

Q: When would I use Amazon Redshift vs. Amazon EMR?
Q: Can Redshift Spectrum replace Amazon EMR?
Q: Can I use Redshift Spectrum to query data that I process using Amazon EMR?

— Reference : Redshift faq

Please check below EMR specific faq:

Q: What can I do with Amazon EMR?
Q: Who can use Amazon EMR?
Q: What can I do with Amazon EMR that I could not do before?
Q: What is the data processing engine behind Amazon EMR?
Q: What is Apache Spark?
Q: What is Presto?

— Reference : EMR faq

** Point 2. I am listing other resources which can help to understand RDS and EMR use cases better.

— Reference :
AWS redshift related case studies > Look for case study section :

— Reference :
AWS EMR related case studies > Look for case study section :

** Point 3. I have tried to check some of AWS blogs which shows how EMR and RDS can be used together in specific use cases. 

— How I built a data warehouse using Amazon Redshift and AWS services in record time

— Build a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP

— Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning

Hope this information helps in understanding EMR and Redshift use cases better.


Need to learn more about aws big data (demystified)?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s