What is Athena?
A serverless SQL database as a service. The uniqueness of this database stems from the fact that it is both a “pay as you go” service and it has a decoupled compute storage.
- Easy to get started.
- Rich Syntax, Presto based syntax, supports complex data structures and Highly nested JSON files.
- Pay as you go managed service.
Athena use cases:
- When you start learning about the Hadoop ecosystem, this is a good technology to learn how to create complex tables with complex data types. Experimenting on different Hadoop storage formats and compression types.
- The classic use case in IMHO is on a presentation layer, where the BI solution uses AWS Athena as its backend. To learn more about how to incorporate Athena in your AWS big data architecture, read this blog.
- Another classic use case is AD HOC queries, i.e imagine a scenario where you want to query a 100TB of logs that are already on AWS S3, you can use Athena to quickly query these logs.
Athena antipattern use cases:
- Compute intensive queries are to be avoided. This is generally truly for Big Data ETL pipelines and use cases. However, it becomes more extreme as AWS Athena’s compute layer can not be accessed, scaled out, or in any way managed by the end user.
- Parallelism– for use cases that require more than 5 concurrent queries.
- Better to avoid large scale joins and Windows functions.
Our Athena Blogs
- Converting TPCH data from row based to columnar Via Hive or SparkSQL and run ad hoc queries via Athena on columnar data
- How to ignore quoted fields inside a CSV via AWS Athena?
- AWS Athena Error: Query exhausted resources at this scale factor
- Error: HIVE_PATH_ALREADY_EXISTS: Target directory for table | error while CTAS in AWS Athena
- Gibberish in AWS Athena? instead of Hebrew ?
- AWS Athena & Presto Cheat sheet
- AWS Athena Cheat sheet
- AWS Athena how to work with JSON
- Ignoring quotes in CSV while working in Athena , hive, spark SQL
- 16 Tips to reduce costs on AWS SQL Athena
- How to export data from Google Big Query into AWS S3 + EMR hive or AWS Athena