What is Hive?
Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
Hive gives an SQL-like interface to query data stored in various databases and file systems that integrates with Hadoop.
- Hive is very flexible and has many options to transform data which is complex such as JSON, AVRO and Parquet.
- Hive Supports both External over S3 and local Tables.
Hive use cases:
Transformation & Cleansing of data.
Hive antipattern use cases:
Real time- as hive’s performance is disk based.
Our Hive Blogs
- Error: HIVE_PATH_ALREADY_EXISTS: Target directory for table | error while CTAS in AWS Athena
- How to restart AWS EMR Hive Metastore
- Cherry pick source files in Hive external table example
- Working with Avro on Hadoop Hive SQL
- AWS EMR Hive Create External table with Dynamic partitioning transformation job example in SQL
- AWS S3 caching while working with Hive spark SQL and External table | LLAP
- How to export data from Google Big Query into AWS S3 + EMR hive or AWS Athena
- Converting TPCH data from row based to columnar Via Hive or SparkSQL and run ad hoc queries via Athena on columnar data
- Getting a sql schema from JSON