AWS Athena

What is Athena?

A serverless SQL database as a service. The uniqueness of this database stems from the fact  that it is both a “pay as you go” service and it has a decoupled compute storage. 

Athena advantages:

    1. Easy to get started.
    2. Rich Syntax, Presto based syntax, supports complex data structures and Highly nested JSON files.
    3. Pay as you go managed service.

Athena use cases:

    1. When you start learning about the Hadoop ecosystem, this is a good technology to learn how to create complex tables with complex data types. Experimenting on different Hadoop storage formats and compression types.
    2. The classic use case in IMHO is on a presentation layer, where the BI solution uses AWS Athena as its backend. To learn more about how to incorporate Athena in your AWS big data architecture, read this blog.
    3. Another classic use case is AD HOC queries, i.e imagine a scenario where you want to query a 100TB of logs that are already on AWS S3, you can use Athena to quickly query these logs.

Athena antipattern use cases:

    1. Compute intensive queries are to be avoided. This is generally truly for Big Data ETL pipelines and use cases. However, it becomes more extreme as AWS Athena’s compute layer can not be accessed, scaled out, or in any way managed by the end user.
    2. Parallelism– for use cases that require more than 5 concurrent queries.  
    3. Better to avoid large scale joins and Windows functions. 

Our Athena Blogs

Architectures and meetups which includes Athena

English

Hebrew

Top Video English

Top Video Hebrew