Logo Jutomate

AWS Zeppelin

AWS Zeppelin What is Zeppelin? An pen source web based notebook – enables running data pipeline orchestration in a combination of technologies such as Bash, SQL and PySpark. Zeppelin advantages: Simple. Basic visualization capabilities. Zeppelin use cases: Ad Hoc data Query. Research. Basic orchestration. data exploration. Zeppelin antipattern use cases: Using Zeppelin mainly as a… Continue reading AWS Zeppelin

apache, architecture, AWS EMR, EMR, Hive, presto, Spark

AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h

This blog Assumes prior knowledge, this to help the reader design training program to newbies on AWS EMR Hadoop. Naturally, my big data perspective is applied here. This blog is FAR FROM BEING PERFECT. Learn the following in rising order of importance (in my humble opinion). Quick introduction to big data in 200 KM/h Beyond… Continue reading AWS EMR and Hadoop Demystified – Comprehensive training program suggestion for Data Engineers in 200KM/h

Logo Jutomate

AWS EMR

AWS EMR What is EMR? Fully managed Hadoop cluster platform, that simplifies running big data frameworks. EMR advantages: EMR is very fast to deploy, just a few minutes to configure and you are ready to go. It contains a variety of Apache open source projects such as Hive, Spark and Presto.   EMR use cases: Transformation… Continue reading AWS EMR

Logo Jutomate

AWS

AWS- Amazon Web Services What are AWS’s advantages in the world of Big Data The Ecosystem of AWS Big Data was designed to give us maximum flexibility and freedom to develop Big Data applications. The Flexibility comes into play when using Apache open sources projects as a managed service in AWS EMR. Furthermore, Each Big… Continue reading AWS

Uncategorised

setting the default interpreter of zeppelin for bootstrapping

When you bootstrap a new EMR zeppelin, once you open the notebook, you will be asked to save the default interpreter. in transient cluster you may want to set the default interpreter automatically. To set the default interpreter, check /etc/zeppelin/conf/interpreter.json and look for something like: … { “name”: “spark”, “class”: “org.apache.zeppelin.spark.SparkInterpreter”, “defaultInterpreter”: true, “editor”: { “language”: “scala”,… Continue reading setting the default interpreter of zeppelin for bootstrapping