BUSINESS ANALYTICS AT SELINA, by Assaf Levinson, Head of Analytics at Selina

What You Will Learn:

Data is a strategic business function, but due to its heavy technical components, the strategic element is often overlooked by the very people who work with it. However, to achieve success in a data career, it is crucial to zero in on the business value of the data provided and to align all data projects and output with the organization’s business goals.

This event aims to provide you, as a professional working in data, with a broader understanding of how to relate the day-to-day technical aspects of data operations to the business goals, and how you can leverage this business understanding (along with your technical skillset) to drive your career forward.


-Hear how an emerging global hospitality disruptor uses a structured data analytics strategy to collect valuable data that fuels their worldwide expansion and high valuation.
-Learn how to talk about dev and BI to business people
-Get tips to strengthen the connection between all elements of data collection (dev, BI, and analytics) to increase the value and efficiency of the data collected


Assaf Levinson – Assaf is the head of business analytics at Selina, the rising ‘AirBNB’ for digital nomads across the world. With operations in 13 countries, an $850M valuation and a recent funding round in April of $100M, Selina is poised to disrupt the global hospitality industry.

Assaf comes with years of business and data experience, having previously served as a product analytics director at Taboola, director of data science at ConvertMedia, and as a business analyst at Rafael.

BI And Analytics: The Business Connection

Sunday, Jun 2, 2019, 6:00 PM

Google Campus
Ha-Umanim St 12 Tel Aviv-Yafo, IL

114 Members Went

Agenda: 18:00 PM – Networking 18:30 PM – BUSINESS ANALYTICS AT SELINA, by Assaf Levinson, Head of Analytics at Selina 19:15 – Break 19:30 PM – ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS’ UNSPOKEN NEEDS, by Eyal Steiner, Senior BI Engineer, Alexa Shopping at Amazon 20:15 pm – Wrap-up *Lectures will be in English *(Kosher) snacks will be served …

Check out this Meetup →

From Redshift to SnowFlake | Big Data Demystified

“From Redshift to SnowFlake”, Yaron Tomer is VP R&D of XMPie:
In the presentation I will briefly take you through our journey of selecting of the best data warehouse for our product, which ended up being Snowflake. We will then deep dive into Snowflake. You will learn about the amazing features and architecture of Snowflake that overwhelmed us and that caused us to understand it is the best solution for us on so many levels. It is rare to see a product so polished. I think the word must be spread, because although Snowflake is quite popular in the valley, only a dozen companies adopted it in Israel.

About the lecturer:
Yaron Tomer is VP R&D of XMPie, providing a cloud based marketing automation product to marketing agencies and fortune 500 enterprises. Before becoming a VP, Yaron managed the development of this SaaS product, from an idea to a mature product. Yaron is passionate or products and technologies.

From Redshift to SnowFlake | Big Data Demystified

Sunday, May 12, 2019, 6:00 PM

hashelosha st 2, Tel Aviv, Tel Aviv-Yafo Tel Aviv-Yafo, al

37 Members Went

Agenda: 18:00 gathering and networking 18:30 “Introduction to Azure Machine Learning” , Guy Glantser, CEO Madeira Data Solutions 19:15 “From Redshift to SnowFlake”, Yaron Tomer is VP R&D of XMPie, “Introduction to Azure Machine Learning” abstract: Machine Learning is currently one of the hottest buzzwords in the high-tech industry. It’s the science…

Check out this Meetup →

Introduction to Azure Machine Learning

“Introduction to Azure Machine Learning” abstract:

Machine Learning is currently one of the hottest buzzwords in the high-tech industry. It’s the science of getting computers to act without being explicitly programmed. The idea behind it is to train the system by learning from historical data, and produce a program that can predict future behavior.

You probably use Machine Learning dozens of times a day without even knowing about it, like when you search the web, buy something in Amazon, or even when you go through your feed in Facebook.

Azure offers a fully managed Machine Learning cloud service that enables you to easily build, deploy, and share predictive analytics solutions. In this session we will learn what Machine Learning is and why we should use it, how it can be used to analyze historical data and predict future behavior, what some of the business use cases for Machine Learning are, and how it all works in Azure. The session includes some cool examples that will demonstrate the power of Machine Learning.

Guy Glantser, bio:

Guy Glantser, Data Platform MVP, is the leader of the Israeli PASS local group and also the CEO and founder of Madeira Data Solutions. His career has been focused on the Microsoft Data Platform for the past 20 years, performing various database roles as either an on-site DBA, an external consultant or a speaker. Guy is involved in many activities in the Microsoft Data Platform community. He occasionally speaks at community events, such as PASS Summit, SQLBits, SQL Saturdays and user groups around the world. He also co-hosts the SQL Server Radio podcast.

Introduction to Azure Machine Learning | Big Data Demystified

Sunday, May 12, 2019, 6:00 PM
Ha-Shlosha St 2 Tel Aviv-Yafo, IL

74 Members Went

Agenda: 18:00 gathering and networking 18:30 “Introduction to Azure Machine Learning” , Guy Glantser, CEO Madeira Data Solutions 19:15 “From Redshift to SnowFlake”, Yaron Tomer is VP R&D of XMPie, “Introduction to Azure Machine Learning” abstract: Machine Learning is currently one of the hottest buzzwords in the high-tech industry. It’s the science…

Check out this Meetup →

what is the cheapest ways to parse GA_sessions at big query? What is the fastest way to parse GA_sessions at big query? What is the simplest way to parse GA_sessions at big query?

Before we begin , be sure to be familiar with the Google analytics schema. Notice the different Record types, especially the hits record.

Option one: sub query

The simplest solution would be using sub query for each custom dimension, processing 3.18 GB. The problem is that if you you a view on top of it, the cost does not behave as you would expect. so be careful, and test everything.

fullVisitorId ,
(select value from h.customDimensions where index = 1) as UID,
(select distinct value from h.customDimensions where index = 36) as App_Edition
1234.ga_sessions_20180501 t, t.hits h

For example

Option 2: User Defined Function:

using used defined function with SQL. it involves baseis coding. This would be the slowest solution, but unlikely to hit a resources problem.

The query below processed 3.18 GB

CREATE TEMPORARY FUNCTION getCustomDimension(cd ARRAY>, index INT64)
for(var i = 0; i < cd.length; i++) {
var item = cd[i];
if(item.index == index) {
return item.value
return ”;

getCustomDimension(hits.customDimensions, 1) AS customDimension1_uid,
getCustomDimension(hits.customDimensions, 36) AS customDimension36_edition
FROM 1234.ga_sessions_20180501 gs_sessions,
UNNEST(gs_sessions.hits) hits

Option 3: With table as select

this option is the cheapest… 570 mb was processed, However. with a large number of custom dimensions , you might hit an error of “Error: Query exceeded resource limits. 86676.81298774751 CPU seconds were used,this query must use less than 55000.0 CPU seconds.” and there is nothing you can do about it . furthermore – the solution is not elegant in terms of maintenance, and it will slow down the more custom dimensions you have.

FROM 1234.ga_sessions_20180501 with_ga_sessions,
UNNEST(with_ga_sessions.hits) with_hits,
UNNEST(with_ga_sessions.customDimensions) with_cd
WHERE with_cd.index IN (1, 36)
CD_1.value AS customDimension1_uid,
CD_36.value AS customDimension36_edition
FROM 1234.ga_sessions_20180501 ga_sessions,
UNNEST(ga_sessions.hits) hits
LEFT JOIN CD CD_1 ON ga_sessions.fullVisitorId = CD_1.fullVisitorId
AND ga_sessions.visitId = CD_1.visitId
AND hits.hitNumber = CD_1.hitNumber
AND CD_1.index = 1
LEFT JOIN CD CD_36 ON ga_sessions.fullVisitorId = CD_36.fullVisitorId
AND ga_sessions.visitId = CD_36.visitId
AND hits.hitNumber = CD_36.hitNumber
AND CD_36.index =36

Option4: Flatten everything using AirFlow or any other orchestration tool.

This is the most expensive way, as each day we overwrite the entire history, as the custom dimension may change in some use case. if they don’t, then this approach may be be simple, cheap and fast. The idea is simple – convert each record to a bunch of columns and utilize the columnar power of bigquery. make sure the target flat table is partitioned by date. Notice that currently each table in BigQuery has a limitation of 2000 partition which means about 5.5 years. so the below is not good enough , consider sharded tables based on years. The storage here will be flattened, which means, anot more storage will be saved in BQ compared to GA_sessions storage only (about doble), but the benefit will be in compute time as the the table is columnar, and most queries require a short amount of columns.

CREATE  table `gap---all-sites-1245.myDataSet.GA_FLAT` 
  ( date date,
 hits_type string,
 hits_page_pagePath string,
 hits_page_pagePathLevel1	string,
 hits_page_pagePathLevel2	string,
 hits_page_pagePathLevel3		string,
 hits_page_pagePathLevel4		string,
 hits_page_hostname string,
 hits_page_pageTitle string,
 hits_appInfo_appId string,
 hits_appInfo_appName string,
 hits_appInfo_screenName string,
 hits_eventInfo_eventCategory string,
 hits_eventInfo_eventAction string,
 hits_eventInfo_eventLabel string,
 hits_eventInfo_eventValue string,
    uid string,
    app_edition string,

    clientId string,
    fullVisitorId string,
    visitorId string,
    userId string,
    visitNumber string,
    visitId string,
    visitStartTime string,
    totals_bounces string,
    totals_hits string,
    totals_newVisits string,
    totals_pageviews string,
    totals_screenviews string,
    totals_sessionQualityDim string,
    totals_timeOnScreen string,
    totals_timeOnSite string,
    totals_totalTransactionRevenue string,
    totals_transactionRevenue string,
    totals_transactions string,
    totals_UniqueScreenViews string,
    totals_visits string,
    trafficSource_adContent string,
    rafficSource_adwordsClickInfo_adGroupId string,
    trafficSource_adwordsClickInfo_adNetworkType string,
    trafficSource_adwordsClickInfo_campaignId string,
    trafficSource_adwordsClickInfo_creativeId string,
    trafficSource_adwordsClickInfo_criteriaId string,
    trafficSource_adwordsClickInfo_criteriaParameters string,
    trafficSource_adwordsClickInfo_customerId string,
    trafficSource_adwordsClickInfo_gclId string,
    trafficSource_adwordsClickInfo_isVideoAd string,
    trafficSource_adwordsClickInfo_page string,
    trafficSource_adwordsClickInfo_slot string,
    trafficSource_adwordsClickInfo_targetingCriteria_boomUserlistId string,
    trafficSource_campaign string,
    trafficSource_campaignCode string,
    trafficSource_isTrueDirect string,
    trafficSource_keyword string,
    trafficSource_medium string,
    trafficSource_referralPath string,
    trafficSource_source string,
    socialEngagementType string,
    channelGrouping string,
    device_browser string,
    device_browserSize string,
    device_browserVersion string,
    device_deviceCategory string,
    device_mobileDeviceInfo string,
    device_mobileDeviceModel string,
    device_mobileInputSelector string,
    device_mobileDeviceMarketingName string,
    device_operatingSystem string,
    device_operatingSystemVersion string,
    device_isMobile string,
    device_mobileDeviceBranding string,
    device_flashVersion string,
    device_javaEnabled string,
    device_language string,
    device_screenColors string,
    device_screenResolution string,
    geoNetwork_continent string,
    geoNetwork_subContinent string,
    geoNetwork_country string,
    geoNetwork_region string,
    geoNetwork_metro string,
    geoNetwork_city string,
    geoNetwork_cityId string,
    geoNetwork_latitude string,
    geoNetwork_networkDomain string
 partition by date

PARSE_DATE('%Y%m%d', REPLACE(_TABLE_SUFFIX, "intraday_", "")) as date, 

type as hit_type, 	  as hits_page_pagePath, as hits_page_pagePathLevel1, as hits_page_pagePathLevel2, as hits_page_pagePathLevel3, as hits_page_pagePathLevel4, as hit_page_hostname, as hits_page_pageTitle,
h.appInfo.appId as hit_appInfo_appId,
h.appInfo.appName as hit_appInfo_appName,
h.appInfo.screenName as hit_appInfo_screenName,

h.eventInfo.eventCategory as hits_eventInfo_eventCategory ,
h.eventInfo.eventAction   as hits_eventInfo_eventAction ,
h.eventInfo.eventLabel    as hits_eventInfo_eventLabel ,
h.eventInfo.eventValue    as hits_eventInfo_eventValue ,

(select value from h.customDimensions where index = 1) as CD_1,
(select distinct value from h.customDimensions where index = 35) as CD_35,

clientId	,
fullVisitorId	,
visitorId	,
userId	,
visitNumber	,
visitId	,
visitStartTime	,
totals.bounces as totals_bounces,
totals.hits	as totals_hits,
totals.newVisits as totals_newVisits	,
totals.pageviews	as totals_pageviews,
totals.screenviews	as totals_screenviews,
totals.sessionQualityDim	as totals_sessionQualityDim,
totals.timeOnScreen	as totals_timeOnScreen,
totals.timeOnSite	as totals_timeOnSite,
totals.totalTransactionRevenue	as totals_totalTransactionRevenue,
totals.transactionRevenue	as totals_transactionRevenue,
totals.transactions	 as totals_transactions,
totals.UniqueScreenViews	as totals_UniqueScreenViews,
totals.visits  as totals_visits,
trafficSource.adContent as trafficSource_adContent	,
trafficSource. adwordsClickInfo.adGroupId as trafficSource_adwordsClickInfo_adGroupId	,
trafficSource. adwordsClickInfo.adNetworkType as trafficSource_adwordsClickInfo_adNetworkType	,
trafficSource. adwordsClickInfo.campaignId as trafficSource_adwordsClickInfo_campaignId	,
trafficSource. adwordsClickInfo.creativeId as trafficSource_adwordsClickInfo_creativeId	,
trafficSource. adwordsClickInfo.criteriaId as trafficSource_adwordsClickInfo_criteriaId	,
trafficSource. adwordsClickInfo.criteriaParameters as trafficSource_adwordsClickInfo_criteriaParameters,
trafficSource. adwordsClickInfo.customerId	as trafficSource_adwordsClickInfo_customerId,
trafficSource. adwordsClickInfo.gclId	as trafficSource_adwordsClickInfo_gclId,
trafficSource. adwordsClickInfo.isVideoAd	as trafficSource_adwordsClickInfo_isVideoAd,
trafficSource. as trafficSource_adwordsClickInfo_page	,
trafficSource. adwordsClickInfo.slot as trafficSource_adwordsClickInfo_slot	,
trafficSource. adwordsClickInfo.targetingCriteria.boomUserlistId	as trafficSource_adwordsClickInfo_targetingCriteria_boomUserlistId,
trafficSource.campaign	as trafficSource_campaign,
trafficSource.campaignCode as trafficSource_campaignCode	,
trafficSource.isTrueDirect as trafficSource_isTrueDirect	,
trafficSource.keyword	as trafficSource_keyword,
trafficSource.medium	as trafficSource_medium,
trafficSource.referralPath as trafficSource_referralPath	,
trafficSource.source	as trafficSource_source,
socialEngagementType	  ,
channelGrouping	,
device.browser as device_browser	,
device.browserSize	 as device_browserSize	,
device.browserVersion	 as device_browserVersion,
device.deviceCategory	 as device_deviceCategory,
device.mobileDeviceInfo	as device_mobileDeviceInfo,
device.mobileDeviceModel	as device_mobileDeviceModel,
device.mobileInputSelector	as device_mobileInputSelector,
device.mobileDeviceMarketingName	as device_mobileDeviceMarketingName,
device.operatingSystem	as device_operatingSystem,
device.operatingSystemVersion	as device_operatingSystemVersion,
device.isMobile	as device_isMobile,
device.mobileDeviceBranding	as device_mobileDeviceBranding,
device.flashVersion as device_flashVersion	,
device.javaEnabled	as device_javaEnabled,
device.language	 as device_language,
device.screenColors	as device_screenColors	,
device.screenResolution	 as device_screenResolution	, 
geoNetwork.continent	as geoNetwork_continent	,
geoNetwork.subContinent	as geoNetwork_subContinent	,	as geoNetwork_country	,
geoNetwork.region	as geoNetwork_region	,
geoNetwork.metro	as geoNetwork_metro	,	as geoNetwork_city	,
geoNetwork.cityId	as geoNetwork_cityId	,
geoNetwork.latitude	as geoNetwork_latitude	,
geoNetwork.longitude as geoNetwork_longitude	,
geoNetwork.networkDomain as geoNetwork_networkDomain,
geoNetwork.networkLocation as geoNetwork_networkLocation

  `gap---all-sites-1245.123492829.ga_sessions_*` t, t.hits h

Parsing the last 4 days of google analytics table GA_sessions in google BigQuery

The problems with querying google BigQuery:

  1. The table is sharded not partitioned – this means each table has it owen table name e.g ga_sessions_20190202
  2. The GA sharded table also has some temporary data on a table called ga_sessions_intraday_20190505
  3. so when you select * from ga_sessions_* you will also get some unexpected ga_sessions_intraday_* shards

Below is example of querying the last 4 days in any GA_sessions table

select * from myProject.MyDataset.ga_sessions_*