Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)

To secure the thrift connection you can enable the ssl encryption and restart the hive-server2 and thrift service on emr master instance.

Following are the list of step to do so:
1. Create the self-signed certificate and add it to a keystore file using:
$ keytool -genkey -alias public-dnshostname -keyalg RSA -keystore keystore.jks -keysize 2048

Make sure the name used in the self signed certificate matches the hostname (use public dns name since you are connecting from outside of VPC) where Thrift server will run.

2. List the keystore entries to verify that the certificate was added. Note that a keystore can contain multiple such certificates:

$ keytool -list -keystore keystore.jks

3. Export this certificate from keystore.jks to a certificate file:
$ keytool -export -alias public-dnshostname -file example.com.crt -keystore keystore.jks

4. Add this certificate to the client’s truststore to establish trust from where you want to connect. since you are connecting from local instance, copy the certificate “example.com.crt” to your local instance from emr master node and then import it.

$keytool -import -trustcacerts -alias public-dnshostname -file example.com.crt -keystore truststore.jks

5. Verify that the certificate exists in truststore.jks:
$keytool -list -keystore truststore.jks

Once the certificate is imported, make the following changes in /etc/hive/conf/hive-xml site.
+++
hive.server2.transport.mode : http
hive.server2.use.SSL : true
hive.server2.keystore.path : path/to/your/keystore/jks
hive.server2.keystore.password : “keystorepassword”
+++

Restart hive-server2 and thrift server
$ sudo stop hive-server2 && sudo start hive-server2
$ sudo -u spark /usr/lib/spark/sbin/stop-thriftserver.sh && sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh

check whether service started successfully and also verify that master instance is listening on port 10001
+++
$ sudo netstat -tulpan |grep 10001
tcp 0 0 :::10001 :::* LISTEN 12494/java
+++

Once service is started then you can make connection using jdbc driver as below

jdbc:hive2://emr-dnsname:10001/default;hive.server2.transport.mode=http;ssl=true;sslTrustStore=/pathto/truststore.jks;trustStorePassword=”password“

Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)

Need to learn more about aws big data (demystified)?

Leave a ReplyCancel reply

Need to learn more about aws big data (demystified)?

Leave a ReplyCancel reply

Discover more from Big Data Demystified