AWS Athena how to work with JSON
Author: Ariel Yosef
How to query a nested json in AWS Athena
Json can contain nested values. those values can be represented as “key=value” or “array of values” or “array of key=value”
How to query “key=value”
name | age | cars |
John | 30 | {car1=Ford, car2=BMW, car3=Fiat} |
maria | 21 | {car1=Ford, car2=BMW, car3=Fiat} |
sonia | 25 | {car1=Ford, car2=BMW, car3=Fiat} |
We can see the column “cars” has a column with “keys and values”
In order to query that, we will simply query the keys and values column (cars) and add “.key”.
Let’s say we want to extract the car2 from column cars
SELECT name , age ,cars.car2 as car2, FROM our_table
note: “as” statement just gives the column name.
Note: the car2 is the key we want to get from the original table.
name | age | car2 |
John | 30 | BMW |
maria | 21 | BMW |
sonia | 25 | BMW |
How to Query an array:
department | users |
engineering | [Sharon, John, Bob, Sally] |
We need to unnest the array column and to connect it with the original table.
SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)
department | users |
engineering | Sharon |
engineering | John |
engineering | Bob |
engineering | Sally |
Here we extract the values from the array and attach them to the old table.
How to query Nested JSON with array of key values:
city | worker |
NYC | [{name=Sally, department=engineering, age=31}, {name=John, department=finance, age=27}, {name=Amy, department=devops, age=41}] |
To query that kind of data, we need first to unnest the array values and then select the column we want.
First we will add the unnested column by cross join and unnest:
select city,worker_unnested from dataset
cross join unnest(dataset.worker) as t(worker2)
city | worker_unnested |
NYC | {name=Sally, department=engineering, age=31} |
NYC | {name=John, department=finance, age=27} |
NYC | {name=Amy, department=devops, age=41} |
Now we can add to our query the specific column we want to query by using the “.key”:
select city,worker_unnested.name,worker_unnested.department,worker_unnested.age from dataset
cross join unnest(dataset.worker) as t(worker_unnested)
city | worker_unnested.name | worker_unnested.department | worker_unnested.age |
NYC | Sally | engineering | 31 |
NYC | John | finance | 27 |
NYC | Amy | devops | 41 |
Watch another relevant post: AWS Athena Cheat Sheet
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with,
feel free to contact me via LinkedIn: