Blog

architecture, GCP Big Data Demystified, superQuery

80% Cost Reduction in Google Cloud BigQuery

BigQuery Cost Reduction Demystified

Lecturer: Omid Vahdaty 15.6.2022

In this lecture I will share with how I saved 80% of BigQuery monthly billing of investing.com.
How to reduce costs using GCP big Query? what should we pay attention to?
We are going to cover all of google best practices while working with BigQuery.

Video

Slides

27.10.2019

Video


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

BI

How to connect BigQuery to Looker

Looker 101 Chapter 1

How to connect BigQuery to Looker

In this chapter, we are going to learn how to connect the Google Cloud Platform (GCP)  to our looker account. 
Let’s start!
1. Go to the Google Cloud Platform home page. From APIs & Services, Select Credentials

2. From CREATE CREDENTIALS, Select Service account

3. Fill in the Service account details
– Select a name for the Service account.
– Select Service account.
-Add description (optional).
– Press CREATE AND CONTINUE. 

4. In Grant this service account access to project -> Select a role-> BigQuery Admin. Select Continue.
5. Grant users access to this service account– optional. You will be able to add users later on if needed
Select  DONE

6. Go to the Credentials tab, under Service Accounts, and click on the account you’ve just created (in this example “Evya-1”)

7. In the keys tab, from Add Key, select Create new key.

8. Select Create. (A JSON file that contains the key will be downloaded to your PC)

9. Go to Looker homepage, choose Develop

10. Select Projects

11. Choose Admin ->Connections-> Add Connection

12. In connection Setting:
-Choose a Name.
-In the Dialect setting, Choose Google BigQuery Standard SQL.

In Project ID: 
            Get the project’s name of the BigQuery project that contains the dataset you want to work with.

            – Go to the Google Cloud Platform home page

            – From BigQuery, Select SQL workspace

            – Copy the name of the project you want  

            – Return to Looker and Enter it in the Project ID

In Dataset:            
       -Go to the Big Query homepage   and copy the dataset name you want to use.
       -Enter this name in the Dataset pane on the looker page.

In Service Account Email:
You will need the email address of the service account you created earlier on the Google Cloud Platform. 
-Go to the Google Cloud Platform homepage. 
Menu-> IAM & Admin -> Service Accounts.

-Find the account name and copy its Email address.
-Enter the email in the Service Account Email pane

In Service Account JSON/P12 File -> insert the JSON file you’ve downloaded earlier (Choose File-> select the file )
Click Add Connection

13. Let’s test it – 
Find your connection on the list and press the Test button.

Every test should become green – connect, kill and query

You are all set! 

Hope to see you in the next chapter- How to start a new project in Looker.

architecture

Alluxio – the good, the bad and the ugly

Alluxio - The good, the bad and the ugly

Lecturer: Alexander Leibzon 17.5.2022

In the lecture we learn what exactly is “Alluxio- the data orchestration layer”, go over the use cases we’re running in production in the past 2 years.
The good things (the actual pain points it solves), the bad things (and how we overcome it), and the ugly things (the actual tips and tricks people come to meetups for).

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

ETL tools

The First No-Code ETL Python Integration is Here

The First No-Code ETL Python Integration is Here

Lecturers: Ariel Yosef & Ophir Prusak , 26.4.2022

A live demonstration of Rivery’s industry-first Python integration.
The Python integration allows you to:

  • Run custom Python code directly within a No-Code ETL Platform.
  • Easily get your data into (or out of) Python without the need to write any connectivity code in Python.
  • Transform your data on-the-fly

Lecturers:
Ophir Prusak, Product Marketing at Rivery
Ariel Yosef, Data Engineer at Jutomate

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

architecture

My First Architecture

My First Architecture

Part 1

Lecturer: Omid Vahdaty , 1.3.2022

“I have a MySQL DB with ±50GB AND ±100 tables, which db and orchestration tools should I use?”
This question started a nice discussion in one of the forums I had an honor to participate.
In this session I am going to share  my personal critical thinking process that generates one output: a Big Data Architecture.
The idea of this event – let’s build the architecture together, let’s review some tools, analyze pros and cons and let’s ask hard question and get answers.

More questions I’m going to answer:
Why Airflow? How is it different from Python? and what are the alternatives?

Video

Slides

Part 2

Lecturer: Omid Vahdaty , 15.3.2022

In this part we are going to cover questions and answers regarding orchestration (Airflow, Python and SaaS tools)
We will build the architecture together, review some tools, analyze pros and cons and ask hard question and get answers.
More questions I’m going to answer:
Why Airflow? How is it different from Python? and what are the alternatives?

Video

Slides

Part 3

Lecturer: Omid Vahdaty , 30.3.2022

In this session I am going to cover the technological aspects of choosing a BI tool.
BI architecture, What is the consideration set? What should I ask the vendor and pay attention to? What is the relevance to infrastructure ?

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

architecture, Snowflake

How Skai leverages Snowflake to move faster

How Skai leverages Snowflake to move faster

Lecturer: Pablo Roth, 22.2.2022

Skai’s challenges in developing and operating a data platform at scale.
Skai’s data platform ingests on a daily basis data from over 200,000 tables distributed over 650 servers, microservices and SaaS applications. Managing more than 2PB of data.
We will talk about the bottlenecks of managing this kind of a data platform on Hadoop and the reasons why we looked for a better technology.
We will deep dive into how to manage a migration project of this magnitude in a live and continuously growing platform without lowering SLA.
We will wrap up with what we have gained so far and how Snowflake helps us move faster.

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty: