AWS Hive

What is Hive?

Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
Hive gives an SQL-like interface to query data stored in various databases and file systems that integrates with Hadoop.

Hive advantages:

Hive is very flexible and has many options to transform data which is complex such as JSON, AVRO and Parquet.
Hive Supports both External over S3 and local Tables.

Hive use cases:

Transformation & Cleansing of data.

Hive antipattern use cases:

Real time- as hive’s performance is disk based.

Our Hive Blogs

Error: HIVE_PATH_ALREADY_EXISTS: Target directory for table | error while CTAS in AWS Athena
How to restart AWS EMR Hive Metastore
Cherry pick source files in Hive external table example
Working with Avro on Hadoop Hive SQL
AWS EMR Hive Create External table with Dynamic partitioning transformation job example in SQL
AWS S3 caching while working with Hive spark SQL and External table | LLAP
How to export data from Google Big Query into AWS S3 + EMR hive or AWS Athena
Converting TPCH data from row based to columnar Via Hive or SparkSQL and run ad hoc queries via Athena on columnar data
Getting a sql schema from JSON

Architectures and meetups which includes Hive

English

AWS Big Data in 200KM/h
AWS Big Data Demystified – Part 1 [English]

Hebrew

200KM/h overview on Big Data in AWS | Part 1
200KM/h overview on Big Data in AWS | Part 2