architecture, AWS, AWS athena, AWS EMR, Cloud, Data Engineering, Spark

Big Data in 200KM/h | Big Data Demystified

What we’re about

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…

Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS infrastructure to answer the basic questions of anyone starting their way in the big data world.

how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORCwhich technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL?how to handle streaming?how to manage costs?Performance tips?Security tip?Cloud best practices tips?

Some of our online materials:

Website:

https://big-data-demystified.ninja/

Youtube channels:

https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber

Meetup:

https://www.meetup.com/AWS-Big-Data-Demystified/

https://www.meetup.com/Big-Data-Demystified

Facebook Group :

https://www.facebook.com/groups/amazon.aws.big.data.demystified/

Facebook page (https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/)

Audience:

Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D

AWS Big Data Demystified

Tel Aviv-Yafo, IL
729 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Check out this Meetup Group →

Big Data Demystified

Tel Aviv-Yafo, IL
873 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Next Meetup

Machine Learning Essentials | Big Data Demystified

Wednesday, Sep 4, 2019, 6:00 PM
77 Attending

Check out this Meetup Group →

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS athena

How to ignore quoted fields inside a CSV via AWS Athena?

The idea is to tell athena via the create table , to ignore quoted fields

CREATE external TABLE

create table myTable(
id bigint,
guid string)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
WITH SERDEPROPERTIES (
“separatorChar” = “,”,
“quoteChar” = “\””
)
STORED AS TEXTFILE
LOCATION ‘s3://my-bucket/’;

Also committed in out big data demystified github.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

architecture, AWS, AWS athena, AWS Big Data Demystified, AWS EMR, AWS Redshift, Data Engineering, EMR, Spark

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…

Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS & GCP and Data Center infrastructure to answer the basic questions of anyone starting their way in the big data world.

how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORC,AVRO which technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL? GCS? Big Query? Data flow? Data Lab? tensor flow? how to handle streaming? how to manage costs? Performance tips? Security tip? Cloud best practices tips?

In this meetup we shall present lecturers working on several cloud vendors, various big data platforms such hadoop, Data warehourses , startups working on big data products. basically – if it is related to big data – this is THE meetup.

Some of our online materials (mixed content from several cloud vendor):

Website:

https://big-data-demystified.ninja (under construction)

Meetups:

Big Data Demystified

Tel Aviv-Yafo, IL
494 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Next Meetup

Big Data Demystified | From Redshift to SnowFlake

Sunday, May 12, 2019, 6:00 PM
23 Attending

Check out this Meetup Group →

AWS Big Data Demystified

Tel Aviv-Yafo, IL
635 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Check out this Meetup Group →

You tube channels:

https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber

https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

Audience:

Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

architecture, AWS, AWS athena, meetup

Serverless Data Pipelines | Big Data Demystified

We had the pleasure to host Michael Haberman, Founder at Topsight :

Serverless is the new kid in town but lets not forget data which is also critical for your organisation, in this talk we will look at the benefits of going serverless with your data pipeline, but also the challenges it raises. This talk will be heavily loaded with demos so watch out!

AWS Big Data Demystified | Serverless data pipeline

Sunday, Mar 3, 2019, 6:00 PM

Investing.com
Ha-Shlosha St 2 Tel Aviv-Yafo, IL

56 Members Went

Agenda: 18:00 networking and gathering 18:30 “A Polylog about Redis” , Itamar Haber 19:15 “Serverless data pipeline” , Michael Haberman Lecturer : Itamar Haber, Technology Evangelist —————————————————————- Bio: a self proclaimed “Redis Geek”, Itamar is the Technology Evangelist at Redis Labs, the home of op…

Check out this Meetup →

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

AWS athena

AWS Athena Error: Query exhausted resources at this scale factor

Athena is a Serverless technology i.e. It make use of shared resources available with AWS and hence, when large amount of queries are submitted by users concurrently around the world at the same time, sometimes resource exhaustion take place.  Athena service team has identified this as a known issue.

However, this error is transient in nature. By that I mean if you can submit the query again, it might be successful. If you repeatedly get the same error consistently, then you might need to partition your data and optimize the query further as mentioned in Performance Tuning Best Practices for Athena [1] and another aritcle in this blog about cost reduction which in turn might reduce resource consumption [2].

You can find suggestions below  from AWS support team:

1) Avoid submitting queries at the beginning or end of an hour. If query fails, Back off exponentially by some minutes and try to submit query again. [ Wierd, but thats an official answer…]
2)  highly recommended to adopt Amazon Athena best practices [1] to optimize your query and your data.
3) Use columnar formatted data which can drastically reduce the resource consumption.

[1] Top 10 Performance Tuning Best Practices for Athena — https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

[2]  aws atehna Cost reduction (might also reduce resource consumption)

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/