Logo Jutomate

AWS Hive

AWS Hive What is Hive? Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrates with Hadoop. Hive advantages: Hive is very flexible and has many options to transform… Continue reading AWS Hive

AWS EMR, Hive

Cherry pick source files in Hive external table example

Cool way to filter files on your bucket for an external table on hive ! —————————————————————————————————————————— I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact… Continue reading Cherry pick source files in Hive external table example

Logo Jutomate

AWS EMR

AWS EMR What is EMR? Fully managed Hadoop cluster platform, that simplifies running big data frameworks. EMR advantages: EMR is very fast to deploy, just a few minutes to configure and you are ready to go. It contains a variety of Apache open source projects such as Hive, Spark and Presto.   EMR use cases: Transformation… Continue reading AWS EMR

apache, architecture, AWS EMR, EMR, Hive, presto, Spark

AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h

This blog Assumes prior knowledge, this to help the reader design training program to newbies on AWS EMR Hadoop. Naturally, my big data perspective is applied here. This blog is FAR FROM BEING PERFECT. Learn the following in rising order of importance (in my humble opinion). Quick introduction to big data in 200 KM/h Beyond… Continue reading AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h