AWS EMR, Spark

Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)

To secure the thrift connection you can enable the ssl encryption and restart the hive-server2 and thrift service on emr master instance.

Following are the list of step to do so:
1. Create the self-signed certificate and add it to a keystore file using:
$ keytool -genkey -alias public-dnshostname -keyalg RSA -keystore keystore.jks -keysize 2048

Make sure the name used in the self signed certificate matches the hostname (use public dns name since you are connecting from outside of VPC) where Thrift server will run.

2. List the keystore entries to verify that the certificate was added. Note that a keystore can contain multiple such certificates:

$ keytool -list -keystore keystore.jks

3. Export this certificate from keystore.jks to a certificate file:
$ keytool -export -alias  public-dnshostname -file -keystore keystore.jks

4. Add this certificate to the client’s truststore to establish trust from where you want to connect. since you are connecting from local instance, copy the certificate “” to your local instance from emr master node and then import it.

$keytool -import -trustcacerts -alias  public-dnshostname -file -keystore truststore.jks

5. Verify that the certificate exists in truststore.jks:
$keytool -list -keystore truststore.jks

Once the certificate is imported, make the following changes in /etc/hive/conf/hive-xml site.
hive.server2.transport.mode : http
hive.server2.use.SSL : true
hive.server2.keystore.path : path/to/your/keystore/jks
hive.server2.keystore.password : “keystorepassword”

Restart hive-server2 and thrift server
$ sudo stop hive-server2 && sudo start hive-server2
$ sudo -u spark /usr/lib/spark/sbin/ && sudo -u spark /usr/lib/spark/sbin/

check whether service started successfully and also verify that master instance is listening on port 10001
$ sudo netstat -tulpan |grep 10001
tcp        0      0 :::10001                    :::*                        LISTEN      12494/java

Once service is started then you can make connection using  jdbc driver as below


Need to learn more about aws big data (demystified)?


I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

Leave a Reply