Blog

architecture, GCP Big Data Demystified, superQuery

80% Cost Reduction in Google Cloud BigQuery

BigQuery Cost Reduction Demystified

Lecturer: Omid Vahdaty 15.6.2022

In this lecture I will share with how I saved 80% of BigQuery monthly billing of investing.com.
How to reduce costs using GCP big Query? what should we pay attention to?
We are going to cover all of google best practices while working with BigQuery.

Video

Slides

27.10.2019

Video


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

architecture

The lack of communication between data consumers

The lack of communication between data consumers

Lecturer: Noy Twerski, 15.11.2022

At this meetup, we’ll explore how poorly communicated companies can waste time and make mistakes in their business decisions.
We will also share best practices from companies that have already changed their collaboration culture and some tips on how to avoid the mistakes by collaboration methods.

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

AWS EMR

AWS EMR Demystified

AWS EMR Demystified - parts1-3

Lecturer: Omid Vahdaty, November 2022

Basically I will be teaching AWS EMR / Hadoop inside out and answer any questions you may have.

1. Introduction to AWS EMR (Hadoop managed service)
2. Introduction to AWS Networking and S3 and different types of Hadoop storage, and any required AWS jargon to handle this meeting.
3. Introduction to AWS Glue, Athena, and how it all connects.
4. Running your first PySpsark Job.
5. Challenges with transformation of data using AWS EMR.
6.  Pros and cons of AWS EMR Architecture in your data lake.

Part 1

Video

Slides

Part 2

Video

Slides

Part 3

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

architecture

Airflow Distributed work loads vs AWS Lambda vs Multi Threaded Python Script

Airflow Distributed work loads vs AWS Lambda vs Multi Threaded Python Script

Lecturer: Omid Vahdaty 18.11.2022

1. Pro/cons using Airflow, Python script and AWS Lambda.

2. Architecture consideration when writing a large scale ETL on 3rd party API

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

Data Science

Boost Your AI With Quality Data

Boost Your AI With Quality Data

Lecturer: Magdalena Konkiewicz 20.9.2022

All AI projects start with data — no matter how simple your idea is, you cannot develop machine learning algorithms without examples to train them on. And after the first prototype, when the chase for metrics improvement begins, you find out that the amount and quality of your data matters. That is when a good data labeling pipeline will probably help you a lot.
In this talk we give an introduction to building data labeling pipelines and present real life use cases from different areas such as search relevance, content moderation, voice assistants and self-driving cars. We will explain how to fight concept drift in machine learning, how to build complex products using human-in-loop model and how to remove people management from the data labeling process. 

Lecturer: Magdalena Konkiewicz, a Data Evangelist at Toloka which is a global data labeling company servicing the needs of approximately 2,000 large and small businesses worldwide.
Toloka helps its customers generate machine learning data at scale by harnessing the wisdom of the crowd from around the world.
Toloka is used by organizations in e-commerce, R&D, banking, autonomous vehicles, web services, and more.
Toloka relies on a geographically diverse crowd of several million registered users – 200,000 of which are active monthly, on average. The company is incorporated in Switzerland and has its global headquarters in the USA. Magdalena prior to joining Toloka has worked in many different sectors in technical roles such as NLP Engineer, Developer, and Data Scientist. She has also been involved in teaching and mentoring Data Scientists. Additionally, she contributes to one of the biggest Medium publications Towards Data Science writing about Machine Learning tools and best practices.

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

Logo Jutomate
airflow

Airflow-distinguish between environments

Airflow - distinguish between environments

Authors: Omid Vahdaty and Amir Miller 7.9.2022

our use case (in 1min):

We have Airflow (> 2.0) in two environments – PROD and DEVELOPMENT. Let’s distinguish between the two:
in airflow.cfg under [webserver] –> insert new parameter: 
instance_name = <choose the env name> (e.g instance_name = DevEnv) 

Want to make it better? change the navbar_color parameter (e.g navbar_color = #50C878)


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty: