architecture

Alluxio – the good, the bad and the ugly

Alluxio - The good, the bad and the ugly

Lecturer: Alexander Leibzon 17.5.2022

In the lecture we learn what exactly is “Alluxio- the data orchestration layer”, go over the use cases we’re running in production in the past 2 years.
The good things (the actual pain points it solves), the bad things (and how we overcome it), and the ugly things (the actual tips and tricks people come to meetups for).

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

ETL tools

The First No-Code ETL Python Integration is Here

The First No-Code ETL Python Integration is Here

Lecturers: Ariel Yosef & Ophir Prusak , 26.4.2022

A live demonstration of Rivery’s industry-first Python integration.
The Python integration allows you to:

  • Run custom Python code directly within a No-Code ETL Platform.
  • Easily get your data into (or out of) Python without the need to write any connectivity code in Python.
  • Transform your data on-the-fly

Lecturers:
Ophir Prusak, Product Marketing at Rivery
Ariel Yosef, Data Engineer at Jutomate

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

architecture

My First Architecture

My First Architecture

Part 1

Lecturer: Omid Vahdaty , 1.3.2022

“I have a MySQL DB with ±50GB AND ±100 tables, which db and orchestration tools should I use?”
This question started a nice discussion in one of the forums I had an honor to participate.
In this session I am going to share  my personal critical thinking process that generates one output: a Big Data Architecture.
The idea of this event – let’s build the architecture together, let’s review some tools, analyze pros and cons and let’s ask hard question and get answers.

More questions I’m going to answer:
Why Airflow? How is it different from Python? and what are the alternatives?

Video

Slides

Part 2

Lecturer: Omid Vahdaty , 15.3.2022

In this part we are going to cover questions and answers regarding orchestration (Airflow, Python and SaaS tools)
We will build the architecture together, review some tools, analyze pros and cons and ask hard question and get answers.
More questions I’m going to answer:
Why Airflow? How is it different from Python? and what are the alternatives?

Video

Slides

Part 3

Lecturer: Omid Vahdaty , 30.3.2022

In this session I am going to cover the technological aspects of choosing a BI tool.
BI architecture, What is the consideration set? What should I ask the vendor and pay attention to? What is the relevance to infrastructure ?

Video

Slides


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn – Omid Vahdaty:

Big Query

bigquery bq load error- “cannot determine table described”

BigQuery bq load error- "cannot determine table described"

Author: Omid Vahdaty 21.4.2021

If you are are getting this error , it is an authentication and authorization issue, simply log out and log in again. e.g if you are using cloud shell – close it and reopen.

Commands like the below will describe your project and data set – but still wont sent the cmd to BQ API:

bq show mydataset.my_test
bq show mydataset 

You can also try adding explicit project id as follows:

projectid:myset.my_test


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

airflow

Airflow Exception: “raise InvalidToken cryptography.fernet.InvalidToken”

Airflow Exception: "raise InvalidToken cryptography.fernet.InvalidToken"

Author: Omid Vahdaty 5.4.2021

If you get this error of invalid token, it is because Airflow is using Fernet. Airflow encrypt all the passwords for its connections in the backend database.

Somehow Airflow backend is using previous fernet key and you have generated a key  which you have created in a new connection.

My recommendation is to do the following first:

This will help in deleting all the existing records in your backend db. NOTICE – this will delete all your Airflow connections and variables you inputted manually:

airflow resetdb
airflow initdb

This will initialize backend db like a fresh install db. Airflow may shout about missing variables.
Start Airflow and enter missing variables  one by one.

Then start airflow web server and scheduler.


——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way.
If you have any comments, thoughts, questions, or you need someone to consult with,

feel free to contact me via LinkedIn: