How to connect via JDBC to Spark SQL EMR on AWS

you should be able to connect to thrift server using other SQL JDBC clients (if not beeline) on 5.x.x AMI clusters. For this you would to copy all the JARs from /usr/lib/spark/jars location from EMR master node. I would recommend to copy all of them to local machine just to avoid any further errors, but what we are looking for is this jar – hive-jdbc-1.2.1-spark2-amzn-0.jar. Please give this a try and let me know how it goes.

For example, here are sample instructions on how to do this using Squirrel SQL client :

You will need to have SQuirrel SQL installed on your local machine. You can download SQuirrel SQL here: http://www.squirrelsql.org/#installation

1.) On the master node, start Spark Thrift Server by running “sudo /usr/lib/spark/sbin/start-thriftserver.sh“

2.) Copy all the required libraries from the master node of your cluster to a folder on your local machine. The libraries are present inside /usr/lib/spark/jars folder . You will need to copy all the .jar files.

3.) Open SQL client and create a new driver. Enter name as “Spark JDBC Driver”, Example URL as “jdbc:hive2://localhost:10001” user:hive , password:empty

4.) Now click on the “Extra Class Path” tab and click on the “Add” button.

5.) In the dialog box, navigate to the folder where you had copied the .jar files in Step 2 and select all the files.

6.) Finally in the Class Name field, enter org.apache.hive.jdbc.HiveDriver and click on Ok.

7.) Start a SSH tunnel using local port forwarding by running this command from your local machine: “ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10001:master-public-dns-name:10001 hadoop@master-public-dns-name“

8) you can connect without tunnel – just open 10001 in the SG (open for now), and change the server ip from jdbc:hive2://localhost:10001 to jdbc:hive2://master-public-dns-name:10001“

Need to learn more about aws big data (demystified)?

Contact me via linked in Omid Vahdaty
website: https://amazon-aws-big-data-demystified.ninja/
Join our meetup, FB group and youtube channel
- Join our meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
- Join our facebook group https://www.facebook.com/groups/amazon.aws.big.data.demystified/
- subscribe to our youtube channel https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Need to learn more about aws big data (demystified)?

Leave a ReplyCancel reply

Discover more from Big Data Demystified