Uncategorised

How to connect to Google Ad manager via python API | GAM demystified

  1. To simply authentication, You need a G Suite gmail user with admin permission to GAM console.
  2. make sure API access is enable in Google Ad Manager
  3. You need a network from google ad manager.
  4. Install a GCE machine with debian, Notice your neet python 3.6 for any version higher than v201911
sudo apt update
sudo apt -y install python-pip python3-venv python3-pip
sudo apt-get -y install git
git clone https://github.com/googleads/googleads-python-lib.git
pip3 install googleads
pip install google-auth-oauthlib

and run example

cd ~/googleads-python-lib/examples/ad_manager/v201911/report_service
python3 run_reach_report.py

This will provide an authentication error:

  File "/home/analyticsynet/.local/lib/python3.5/site-packages/googleads/common.py", line 248, in LoadFromStorage
    'Given yaml file, %s, could not be opened.' % path)
googleads.errors.GoogleAdsValueError: Given yaml file, /home/omid/googleads.yaml, could not be opened.

you need to create OAuth json and get cilentID and ClientSecret, follow these instructions:

https://github.com/googleads/googleads-python-lib#getting-started

Notice OAuth should be blank for all fields and application type is “other” as stated here

and test authentication via:

cd ~/googleads-python-lib/examples/ad_manager/authentication

python generate_refresh_token.py --client_id INSERT_CLIENT_ID --client_secret INSERT_CLIENT_SECRET

if you are getting an error like the below, this means you have not chosen application type “other” in OAuth:

Issue:
The redirect URI in the request, urn:ietf:wg:oauth:2.0:oob, can only be used by a Client ID for native application. It is not allowed for the WEB client type. You can create a Client ID for native application at

Once you do that – you can now authenticate via the the googleads.yaml file :

  1. Copy the googleads.yaml file to your home directory.
  2. This will be used to store credentials and other settings that can be loaded to initialize a client.
  3. update client and secret client inside the YAML
  4. You also need to get network code in GAM.

If this does not work – try the services account options this manul to get the high level process:

  1. create a service account in https://console.developers.google.com/ , don’t forget to choose json method and download the json private key – this can only happen once. this will create an email like: omid-test1@myproject.iam.gserviceaccount.com
  2. Add service account in google ad manger console. use the email above section.
  3. confirm the user is in Active status before continuing.
  4. copy the json private key to the machine home folder. Icalled it my_gcp_private_key_service_account.json
  5. setup the googleads.yaml file:
ad_manager:
  application_name: INSERT_APPLICATION_NAME_HERE
  network_code: INSERT_NETWORK_CODE_HERE
  path_to_private_key_file: INSERT_PATH_TO_FILE_HERE

You can test quickly via adding json path of the private key of the above services account. update the KEY_FILE (you json key), APPLICATION_NAME (your application)

~/googleads-python-lib/examples/ad_manager/authentication
nano create_ad_manager_client_with_service_account.py 

run the script:

python3 create_ad_manager_client_with_service_account.py 

Expected output

This library is being run by an unsupported Python version (3.5.3). In order to benefit from important security improvements and ensure compatibility with this libr
ary, upgrade to Python 3.6 or higher.
Network with network code "1234" and display name "myRealAppName" was found.

a sample report example with this services account connection in our Git:

#!/usr/bin/env python
#
# Copyright 2014 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Initializes a AdManagerClient using a Service Account."""


from googleads import ad_manager
from googleads import oauth2
from googleads import errors
import tempfile


# OAuth2 credential information. In a real application, you'd probably be
# pulling these values from a credential storage.
KEY_FILE = '/home/omid/111bd7ea6ae8534.json'

# Ad Manager API information.
APPLICATION_NAME = 'myApp'
NETWORK_CODE='6690'

def main(key_file, application_name):
  oauth2_client = oauth2.GoogleServiceAccountClient(
      key_file, oauth2.GetAPIScope('ad_manager'))

  client = ad_manager.AdManagerClient(
      oauth2_client, application_name, NETWORK_CODE)

  #networks = ad_manager_client.GetService('NetworkService').getAllNetworks()
  #for network in networks:
  #  print('Network with network code "%s" and display name "%s" was found.'
  #        % (network['networkCode'], network['displayName']))

  # Initialize a DataDownloader.
  report_downloader = client.GetDataDownloader(version='v201911')

 # Create report job.
  report_job = {
      'reportQuery': {
          'dimensions': ['DATE', 'AD_UNIT_NAME'],
          'adUnitView': 'HIERARCHICAL',
          'columns': ['AD_SERVER_IMPRESSIONS', 'AD_SERVER_CLICKS',
                      'ADSENSE_LINE_ITEM_LEVEL_IMPRESSIONS',
                      'ADSENSE_LINE_ITEM_LEVEL_CLICKS',
                      'TOTAL_LINE_ITEM_LEVEL_IMPRESSIONS',
                      'TOTAL_LINE_ITEM_LEVEL_CPM_AND_CPC_REVENUE'],
          'dateRangeType': 'LAST_WEEK'
      }
  }


  try:
    # Run the report and wait for it to finish.
    report_job_id = report_downloader.WaitForReport(report_job)
  except errors.AdManagerReportError as e:
    print('Failed to generate report. Error was: %s' % e)
 
 # Change to your preferred export format.
  export_format = 'CSV_DUMP'

  report_file = tempfile.NamedTemporaryFile(suffix='.csv.gz', delete=False)

  # Download report data.
  report_downloader.DownloadReportToFile(
      report_job_id, export_format, report_file)

  report_file.close()
  
    # Display results.
  print('Report job with id "%s" downloaded to:\n%s' % (
      report_job_id, report_file.name))


if __name__ == '__main__':
  main(KEY_FILE, APPLICATION_NAME)

You could you date range enums to in ReportService.DateRangeType and change the date range, but that would not fit into your airflow plan. Think Dynamic operators like we did in similarweb airflow blog. you may also need to overwrite partitions of dates.

Lets now you wish to add to python script command line argument of start date and end date. full example of google ad manager python with date range committed to out git and is also below:

import sys, getopt

from datetime import datetime
from datetime import timedelta

from googleads import ad_manager
from googleads import oauth2
from googleads import errors
import tempfile


# OAuth2 credential information. In a real application, you'd probably be
# pulling these values from a credential storage.
KEY_FILE = '/home/omid/my_gcp_private_key_service_account.json'

# Ad Manager API information.
APPLICATION_NAME = 'jutomate'
NETWORK_CODE='6690'

def report(key_file, application_name,startDate,endDate):
  oauth2_client = oauth2.GoogleServiceAccountClient(
      key_file, oauth2.GetAPIScope('ad_manager'))

  client = ad_manager.AdManagerClient(
      oauth2_client, application_name, NETWORK_CODE)

  #networks = ad_manager_client.GetService('NetworkService').getAllNetworks()
  #for network in networks:
  #  print('Network with network code "%s" and display name "%s" was found.'
  #        % (network['networkCode'], network['displayName']))

  # Initialize a DataDownloader.
  report_downloader = client.GetDataDownloader(version='v201911')
  # Set the start and end dates of the report to run (past 0 days, you can change to what u need).
  end_date = datetime.strptime( startDate, "%Y-%m-%d").date()
  
  start_date = datetime.strptime( endDate, "%Y-%m-%d").date()
  
  print ('start_date: ', start_date)
  print ('end_date: ', end_date)
  
  report_filename_prefix='report_example_using_service_account_with_date_range'
  
 # Create report job.
  report_job = {
      'reportQuery': {
          'dimensions': ['DATE', 'AD_UNIT_NAME'],
          'adUnitView': 'HIERARCHICAL',
          'columns': ['AD_SERVER_IMPRESSIONS', 'AD_SERVER_CLICKS',
                      'ADSENSE_LINE_ITEM_LEVEL_IMPRESSIONS',
                      'ADSENSE_LINE_ITEM_LEVEL_CLICKS',
                      'TOTAL_LINE_ITEM_LEVEL_IMPRESSIONS',
                      'TOTAL_LINE_ITEM_LEVEL_CPM_AND_CPC_REVENUE'],
          'dateRangeType': 'CUSTOM_DATE',
          'startDate': start_date,
          'endDate': end_date
      }
  }


  try:
    # Run the report and wait for it to finish.
    report_job_id = report_downloader.WaitForReport(report_job)
  except errors.AdManagerReportError as e:
    print('Failed to generate report. Error was: %s' % e)
 
 # Change to your preferred export format.
  export_format = 'CSV_DUMP'

  report_file = tempfile.NamedTemporaryFile(suffix='_'+report_filename_prefix+'_'+startDate+'__'+endDate+'.csv.gz', delete=False)

  # Download report data.
  report_downloader.DownloadReportToFile(
      report_job_id, export_format, report_file)

  report_file.close()
  
    # Display results.
  print('Report job with id "%s" downloaded to:\n%s' % (
      report_job_id, report_file.name))

def main(argv):
   startDate = ''
   endDate = ''
   try:
      opts, args = getopt.getopt(argv,"hi:o:",["start=","end="])
   except getopt.GetoptError:
      print ('example_python_command_line_arguments.py -s <startDate> -e <endDate>')
      sys.exit(2)
   for opt, arg in opts:
      if opt == '-h':
         print ('example_python_command_line_arguments.py -s <startDate> -e <endDate>')
         sys.exit()
      elif opt in ("-s", "--start"):
         startDate = arg
      elif opt in ("-e", "--end"):
         endDate = arg
   print ('start date is ', startDate)
   print ('end   date is ', endDate)
   report(KEY_FILE, APPLICATION_NAME,startDate,endDate)
   
if __name__ == '__main__':
  main(sys.argv[1:])
© 2020 GitHub, Inc.
Terms
Privacy
Security
Status

AWS, AWS athena, AWS Aurora, AWS Big Data Demystified, AWS EMR, AWS Lambda, AWS Redshift, Hive, meetup, Uncategorised

200KM/h overview on Big Data in AWS | Part 2

in this lecture we are going to cover AWS Big Data PaaS technologies used to model and visualize data using a suggested architecture and some basic big data architecture rule of thumbs.

For more meetups:
https://www.meetup.com/Big-Data-Demystified/

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

open source, Uncategorised

Apache Drill Demystified

Some good reads about Apache Drill. drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill’s datastore-aware optimizer automatically restructures a query plan to leverage the datastore’s internal processing capabilities. In addition, Drill supports data locality, so it’s a good idea to co-locate Drill and the datastore on the same nodes.

Traditional query engines demand significant IT intervention before data can be queried. Drill gets rid of all that overhead so that users can just query the raw data in-situ. There’s no need to load the data, create and maintain schemas, or transform the data before it can be processed. Instead, simply include the path to a Hadoop directory, MongoDB collection or S3 bucket in the SQL query.

Drill leverages advanced query compilation and re-compilation techniques to maximize performance without requiring up-front schema knowledge.

Drill features a JSON data model that enables queries on complex/nested data as well as rapidly evolving structures commonly seen in modern applications and non-relational datastores. Drill also provides intuitive extensions to SQL so that you can easily query complex data.

Drill is the only columnar query engine that supports complex data. It features an in-memory shredded columnar representation for complex data which allows Drill to achieve columnar speed with the flexibility of an internal JSON document model.

https://drill.apache.org/docs/architecture-introduction/

SQL parser: Drill uses Calcite, the open source SQL parser framework, to parse incoming queries. The output of the parser component is a language agnostic, computer-friendly logical plan that represents the query.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

analytics, meetup, Uncategorised

BI STRATEGY FROM A BIRD’S EYE VIEW | BI and Analytics Demystified |Omri Halak, Director of Business Operations at Logz.io

In the talk we will discuss how to break down the company’s overall goals all the way to your BI team’s daily activities in 3 simple stages:

1. Understanding the path to success – Creating a revenue model
2. Gathering support and strategizing – Structuring a team
3. Executing – Tracking KPIs

Bios:

Omri Halak -Omri is the director of business operations at Logz.io, an intelligent and scalable machine data analytics platform built on ELK & Grafana that empowers engineers to monitor, troubleshoot, and secure mission-critical applications more effectively. In this position, Omri combines actionable business insights from the BI side with fast and effective delivery on the Operations side. Omri has ample experience connecting data with business, with previous positions at SimilarWeb as a business analyst, at Woobi as finance director, and as Head of State Guarantees at Israel Ministry of Finance.

Structuring a Strategy: Creating a BI & Analytics Business Plan

Sunday, Jul 21, 2019, 6:00 PM

Google for Startups Campus
Ha-Umanim St 12 Tel Aviv-Yafo, IL

154 Members Went

Learn how to make a structured business plan for your analytics and BI operations that will add value to your bottom line. Agenda: 18:00 PM – Networking 18:30 PM – MAKING YOUR ANALYTICS TALK BUSINESS, by Eliza Savov, Team Lead, Customer Experience and Analytics at Clicktale 19:15 – Break 19:30 PM – BI STRATEGY FROM A BIRD’S EYE VIEW, by Omri Halak,…

Check out this Meetup →

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Uncategorised

Big Data Demystified |Using AI to optimize BigQuery

“Using AI to optimize BigQuery”
As we kick off 2019, one thing is certain: the “Serverless Revolution” has hit the mainstream. However, whereas a majority of serverless discussions tend to focus on its utility in software development, there is a parallel paradigm shift occurring in big data analytics.

With fully-managed data warehouses like Google BigQuery and Amazon Athena, users can start querying petabytes of data at unprecedented speed, paying only for the compute and storage used. Small startups to companies as large as Spotify and Home Depot are migrating to this new way of doing analytics every day.

However, it’s on-demand pricing can be a double-edged sword with data analysts and teams. Scan more data than is necessary, and your company will wind up with an expectedly large bill at the end of the month. Worst of all, with no real-time price transparency while querying, team’s don’t learn about this until it’s too late. As a result, companies set usage limits on their analysts.

This is not an efficient approach for data analytics services as powerful as BigQuery or Athena, as it sets a ceiling to an analyst’s potential contribution. The solution? SQL optimization.

Avi Zloof and Eben du Toit, CEO and Chief Data Scientist at superQuery, respectively, will discuss building ML models to optimize SQL queries and how that removes the need for any restrictions on data analysts.

https://www.youtube.com/watch?v=cXq2tffYQ-A
about the lecturers:
—————————-

Avi Zloof:
Avi has spent the past 20 years leading R&D and innovation teams, managing big data and full stack development initiatives.
Prior to founding superQuery he worked for six years at TradAir leading cloud and big data projects. Big data tools he’s developed are being used by Tier-1 banks daily to trade and analyze billions in currencies.

Ido Vollf:

Ido is a serial entrepreneur who previously founded Sleeve (Edtech), EmbraceMe (IoT), and ChaChange (FinTech).
Ido worked as a Technical and Business Evangelist at Microsoft and was in charge of communicating the value of Microsoft Azure and Windows 10 to Israel’s startup ecosystem.
He’s an active mentor in the Israeli startup ecosystem, helping early stage startups grow fast.

Eben du Toit: Storyteller, data scientist, tech geek and computer engineer.

I’m a storyteller, data scientist, tech geek, computer engineer and control systems expert. My data skills were honed conceptualising, implementing and leading the efforts in building data science stacks. During my engineering career I’ve designed, coded, installed and tested full-stack large-scale IT engineering infrastructure projects at several power plants across South Africa and developed software for both mobile back-end platforms and power industry applications. I have over 14 years experience in engineering and grew up as a child of the internet. Currently chief data scientist at superQuery.

 

contact me for more details or questions (would love to help)

Want more big data quality content? Join our meetup, subscribe to youtube channels

For more information about Superquery:
https://web.superquery.io/?camp=BDD

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/