architecture, AWS athena, AWS EMR, Cloud, Data Engineering, Spark

AWS Big Data in 200KM/h

4th August 201912th May 2022 Omid

AWS Big Data in 200KM/h

Lecturer: Omid Vahdaty ,10.5.2022

AWS Big Data ecosystem and architecture best practices. We will provide a quick overview of all the different big data services in AWS.

Video

Slides

Lecturer: Omid Vahdaty ,4.8.2019

Hebrew meetup

How to transform data (TXT, CSV, TSV, JSON) into Parquet, Which technology should we use to model the data? EMR, Athena, Redshift, Spectrum, Glue, Spark, or SparkSQL? How to handle streaming? How to manage costs? Performance tips, Security tip and cloud best practices tips

Hebrew Video

Slides

——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

AWS athena

How to ignore quoted fields inside a CSV via AWS Athena?

3rd July 201912th November 2019 Omid

The idea is to tell athena via the create table , to ignore quoted fields

CREATE external TABLE

create table myTable(
id bigint,
guid string)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
WITH SERDEPROPERTIES (
“separatorChar” = “,”,
“quoteChar” = “\””
)
STORED AS TEXTFILE
LOCATION ‘s3://my-bucket/’;

Also committed in out big data demystified github.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

architecture, AWS athena, AWS Big Data Demystified, AWS EMR, AWS Redshift, Data Engineering, EMR, Spark

AWS Big Data Demystified – Part 1 [English]

2nd April 20199th August 2020 Omid

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…

Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS & GCP and Data Center infrastructure to answer the basic questions of anyone starting their way in the big data world.

how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORC,AVRO which technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL? GCS? Big Query? Data flow? Data Lab? tensor flow? how to handle streaming? how to manage costs? Performance tips? Security tip? Cloud best practices tips?

In this meetup we shall present lecturers working on several cloud vendors, various big data platforms such hadoop, Data warehourses , startups working on big data products. basically – if it is related to big data – this is THE meetup.

Some of our online materials (mixed content from several cloud vendor):

Website:

https://big-data-demystified.ninja (under construction)

Meetups:

Big Data Demystified

Tel Aviv-Yafo, IL
494 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Next Meetup

Big Data Demystified | From Redshift to SnowFlake

Sunday, May 12, 2019, 6:00 PM
23 Attending

Check out this Meetup Group →

You tube channels:

https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber

https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

Audience:

Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D

——————————————————————————————————————————

https://www.linkedin.com/in/omid-vahdaty/

architecture, AWS athena, meetup

Serverless Data Pipelines

11th March 20199th August 2020 Omid

We had the pleasure to host Michael Haberman, Founder at Topsight :

Serverless is the new kid in town but lets not forget data which is also critical for your organisation, in this talk we will look at the benefits of going serverless with your data pipeline, but also the challenges it raises. This talk will be heavily loaded with demos so watch out!

AWS Big Data Demystified | Serverless data pipeline

Sunday, Mar 3, 2019, 6:00 PM

Investing.com
Ha-Shlosha St 2 Tel Aviv-Yafo, IL

56 Members Went

Agenda: 18:00 networking and gathering 18:30 “A Polylog about Redis” , Itamar Haber 19:15 “Serverless data pipeline” , Michael Haberman Lecturer : Itamar Haber, Technology Evangelist —————————————————————- Bio: a self proclaimed “Redis Geek”, Itamar is the Technology Evangelist at Redis Labs, the home of op…

Check out this Meetup →

——————————————————————————————————————————

https://www.linkedin.com/in/omid-vahdaty/

AWS athena

AWS Athena Error: Query exhausted resources at this scale factor

12th February 20196th August 2020 Omid

AWS Athena Error: Query exhausted resources at this scale factor

Author: Omid Vahdaty 12.2.2019

Athena is a Serverless technology. i.e. It makes use of shared resources available with AWS and hence, when large amount of queries are submitted by users concurrently around the world at the same time, sometimes resource exhaustion take place.
Athena service team has identified this as a known issue.

However, this error is transient in nature, if you can submit the query again, it might be successful.
If you repeatedly get the same error consistently, then you might need to partition your data and optimize the query further as mentioned in Performance Tuning Best Practices for Athena. Another option is to follow this blog- Tips to reduce costs on AWS SQL Athena , which might reduce resource consumption.

AWS support team suggestions:

Avoid submitting queries at the beginning or end of an hour. If query fails, Back off exponentially by some minutes and try to submit query again. [ Wierd, but thats an official answer…]
highly recommended to adopt Amazon Athena best practices to optimize your query and your data.
Use columnar formatted data which can drastically reduce the resource consumption.

feel free to contact me via LinkedIn:

AWS Big Data in 200KM/h

Lecturer: Omid Vahdaty ,10.5.2022

Video

Slides

Lecturer: Omid Vahdaty ,4.8.2019

Hebrew meetup

Hebrew Video

Slides

Big Data Demystified

AWS Big Data Demystified

AWS Big Data Demystified | Serverless data pipeline

AWS Athena Error: Query exhausted resources at this scale factor

Author: Omid Vahdaty 12.2.2019​

AWS support team suggestions:

Author: Omid Vahdaty 12.2.2019