Cloud

Big Data and Hadoop options over Microsoft Azure Cloud summery

Azure HDInsights features and advantages:

  1. Hive with LLAP
  2. Spark , SparlSQL, ML,Steaming
  3. Pig
  4. Hbase,
  5. Storm
  6. U-sql – c# and SQL
  7. federated query across several data sources
  8. Kafka! (with rack awareness to azure), replication with mirror maker
  9. Microsoft R server!
  10. Zeppelin and jupiter integration.
  11. Apache Ambari View.
  12. Sqoop
  13. Oozie
  14. Zookeeper, for leader election of head nodes (master node)
  15. Mahut, discontinued in v4.0
  16. phoenix (SQL over Hbase)
  17. mono – open source C# .net implementation.
  18. Apache Slider – like yarn. https://www.slideshare.net/duttashivaji/apache-slider, not in the new version. discontinued in v4.0
  19. Apache Livy
  20. Security – kerberos, and active directory, apache ranger
  21. External Hive metastore
  22. very rich documentations: https://docs.microsoft.com/en-us/azure/hdinsight/
  23. Rich Developer plugins
    1. Zeppelin
    2. intellij
    3. Eclipse
    4. R
    5. Visual studio
    6. Jupiter

Ecosystem

  1. data lake analytics
  2. machine learning
  3. Power BI!!
  4. Azure Cosmos DB – extensions of Azure documentdB, basically noSQL
  5. Azure data factory – orchestration
  6. Azure Event Hub
  7. ISV data science
    1. H2o
    2. data iku

More advantages

  1. each worker can be configure for different sizes.
  2. Hive ODBC
  3. Hive add on for excel
  4. Auto scaling.

Architecture

  1. Gateway nodes – management and security.
  2. Head nodes – like name node, in High availability
  3. Edge nodes – not for data processing, it is for developer and data scientist job testing.
  4. worker nodes – like data nodes.
  5. zoo keeper nodes – for leader election of head nodes.
  6. nimbus nodes – with storm.
  7. Hive meta store – Azure SQL
  8. Azure Data lake store  and Azure blob

Deployment

  1. Azure cli to create clusters
  2. Airflow – open source.
  3. TBD.

 

 

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Leave a Reply