AWS Athena how to work with JSON

Author: Ariel Yosef

How to query a nested json in AWS Athena

Json can contain nested values. those values can be represented as “key=value” or “array of values” or “array of key=value”

How to query “key=value”

name age cars
John 30 {car1=Ford, car2=BMW, car3=Fiat}
maria 21 {car1=Ford, car2=BMW, car3=Fiat}
sonia 25 {car1=Ford, car2=BMW, car3=Fiat}

We can see the column “cars” has a column with “keys and values”

In order to query that, we will simply query the keys and values column (cars) and add “.key”.

Let’s say we want to extract the car2 from column cars 

SELECT name , age ,cars.car2 as car2, FROM our_table

note: “as” statement just gives the column name.

Note: the car2 is the key we want to get from the original table.

name age car2
John 30 BMW
maria 21 BMW
sonia 25 BMW

How to Query an array:

department users
engineering [Sharon, John, Bob, Sally]

We need to unnest the array column and to connect it with the original table.

SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)

department users
engineering Sharon
engineering John
engineering Bob
engineering Sally

Here we extract the values from the array and attach them to the old table.

How to query Nested JSON with array of key values:

city worker
NYC [{name=Sally, department=engineering, age=31}, {name=John, department=finance, age=27}, {name=Amy, department=devops, age=41}]

To query that kind of data, we need first to unnest the array values and then select the column we want.

First we will add the unnested column by cross join and unnest:

select city,worker_unnested from dataset
cross join unnest(dataset.worker) as t(worker2)

city worker_unnested
NYC {name=Sally, department=engineering, age=31}
NYC {name=John, department=finance, age=27}
NYC {name=Amy, department=devops, age=41}

Now we can add to our query the specific column we want to query by using the “.key”:

select city,worker_unnested.name,worker_unnested.department,worker_unnested.age from dataset
cross join unnest(dataset.worker) as t(worker_unnested)

city worker_unnested.name worker_unnested.department worker_unnested.age
NYC Sally engineering 31
NYC John finance 27
NYC Amy devops 41

Watch another relevant post: AWS Athena Cheat Sheet

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn: