Use Redshift when
- traditional data warehouse
- When you need the data relatively hot for analytics such as BI
- when there is no data engineering team
- When you require joins
- when u need a cluster 24X7
- when you data type are simple
- when no nested jsons
- peta scale database
- when you want analize massive amount of data (spectrum)
- when u need update/delete
- when you require and ACID DBMS
Use EMR (SparkSQL, Presto, hive) when
- When you need a transient cluster
- when elasticity is important (auto scaling on tasks)
- when cost is important: spots
- until a few hundred TB’s
- when you want to separate compute and storage (external table + task node + auto scaling)
- when you require more flexibility
- complex partitions + dynamic partitioning + insert overwrite
- complex data type
- arrays <–> nested json
- orchestration built in
- notebook built in – mix code with SQL
Please check below Redshift specific faq:
Q: When would I use Amazon Redshift vs. Amazon EMR?
Q: Can Redshift Spectrum replace Amazon EMR?
Q: Can I use Redshift Spectrum to query data that I process using Amazon EMR?
— Reference : Redshift faq
Please check below EMR specific faq:
Q: What can I do with Amazon EMR?
Q: Who can use Amazon EMR?
Q: What can I do with Amazon EMR that I could not do before?
Q: What is the data processing engine behind Amazon EMR?
Q: What is Apache Spark?
Q: What is Presto?
— Reference : EMR faq
** Point 2. I am listing other resources which can help to understand RDS and EMR use cases better.
— Reference :
AWS redshift related case studies > Look for case study section :
— Reference :
AWS EMR related case studies > Look for case study section :
** Point 3. I have tried to check some of AWS blogs which shows how EMR and RDS can be used together in specific use cases.
— How I built a data warehouse using Amazon Redshift and AWS services in record time
— Build a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP
— Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning
Hope this information helps in understanding EMR and Redshift use cases better.
Need to learn more about aws big data (demystified)?
- Contact me via linked in Omid Vahdaty
- website: https://amazon-aws-big-data-demystified.ninja/
- Join our meetup, FB group and youtube channel
- Join our meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
- Join our facebook group https://www.facebook.com/groups/amazon.aws.big.data.demystified/
- subscribe to our youtube channel https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber