open source, Uncategorised

Apache Drill Demystified

Some good reads about Apache Drill. drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill’s datastore-aware optimizer automatically restructures a query plan to leverage the datastore’s internal processing capabilities. In addition, Drill supports data locality, so it’s a good idea to co-locate Drill and the datastore on the same nodes.

Traditional query engines demand significant IT intervention before data can be queried. Drill gets rid of all that overhead so that users can just query the raw data in-situ. There’s no need to load the data, create and maintain schemas, or transform the data before it can be processed. Instead, simply include the path to a Hadoop directory, MongoDB collection or S3 bucket in the SQL query.

Drill leverages advanced query compilation and re-compilation techniques to maximize performance without requiring up-front schema knowledge.

Drill features a JSON data model that enables queries on complex/nested data as well as rapidly evolving structures commonly seen in modern applications and non-relational datastores. Drill also provides intuitive extensions to SQL so that you can easily query complex data.

Drill is the only columnar query engine that supports complex data. It features an in-memory shredded columnar representation for complex data which allows Drill to achieve columnar speed with the flexibility of an internal JSON document model.

https://drill.apache.org/docs/architecture-introduction/

SQL parser: Drill uses Calcite, the open source SQL parser framework, to parse incoming queries. The output of the parser component is a language agnostic, computer-friendly logical plan that represents the query.

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Data Engineering, open source, streaming

Introduction to streaming and messaging flume kafka SQS kinesis stream analytics and firehose

 

attached my Introduction to streaming and messaging flume kafka SQS kinesis stream analytics and firehose 

 

be sure to subsribe to our youtube channel:

https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

and Join our meetup

AWS Big Data Demystified

Tel Aviv-Yafo, IL
588 Members

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The techn…

Next Meetup

AWS Big Data Demystified | Serverless data pipeline

Sunday, Mar 3, 2019, 6:00 PM
32 Attending

Check out this Meetup Group →



——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/



apache, architecture, open source, streaming

Flume VS Kafka , basic comparison

I created a basic comparison of flume vs Kafka – have fun!



——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/



AWS EMR, open source

Bootstrapping Oozie

Bootstrap Oozie

  1. define admin user 
    cd /usr/lib/hue
    sudo build/env/bin/hue  createsuperuser
    

    sudo build/env/bin/hue changepassword userName

    
    
    Username (leave blank to use 'root'): <enter the super user name>
    Email address: <your email id>
    Password: <password with one upper case, number, and special character>
    Password (again): 
    Superuser created successfully.
  2. copy files from source EMR

    hadoop fs -get /user/hue/oozie/ /mnt/

    aws s3 mv deployments s3://walla-oozie/ –recursive

    aws s3 cp s3://walla-oozie/ oozie/ –recursive

    hadoop fs -put oozie/workspaces/ /user/hue/oozie

  3. imports /export jsons http://gethue.com/export-and-import-your-oozie-workflows/

    1. source emr
      1. cd /usr/lib/hue
      2. sudo ./build/env/bin/hue dumpdata desktop.Document2 –indent 2 –natural > /home/hadoop/data.json

      3. aws s3 cp /home/hadoop/data.json s3://walla-oozie/

    2. import emr

      1. aws s3 cp s3://walla-oozie/data.json /home/hadoop/

      2. cd /usr/lib/hue
      3. ./build/env/bin/hue loaddata /home/hadoop/data.json



——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/



AWS EMR, open source

adding HTTPS SSL to Hue on AWS EMR Hue

Please find the steps for setting up SSL on Hue interfaces:

HUE ===

1) Create self signed SSL certificate

openssl genrsa 4096 > server.key

openssl req -new -x509 -nodes -sha1 -key server.key > server.cert

Please enter the public DNS name of the master node when asked for hostname.

2) Edit hue.ini file change the values of following properties:

[desktop]

# Filename of SSL Certificate

ssl_certificate=/path/to/server.cert

# Filename of SSL RSA Private Key

ssl_private_key=/path/to/server.key

[[session]]

secure=true

[[ssl]]

enabled=true

validate=false

3) Restart hue service

sudo stop hue

sudo start hue

4) You would be able to access Hue over https on port 8888 on your master node.



——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/