AWS EMR, zeppelin

EMR Zeppelin Security

Apache Shiro is part of the installation of EMR with the options below:

  • Basic authentication (via Apache SHIRO): user management (user,pass,groups), even LDAP

https://zeppelin.apache.org/docs/0.7.3/security/shiroauthentication.html

  • notebook permissions management: read/write/share

https://zeppelin.apache.org/docs/0.7.3/security/notebook_authorization.html

  • Data source authorization (e.g 3rd party DB):

https://zeppelin.apache.org/docs/0.7.3/security/datasource_authorization.html

 

Adding HTTPS/SSL to the EMR Zeppelin GUI (3 options)

 

Walkthrough to add https to Zeppelin:

1) Generating PKCS1 keystore file : Log into the master instance of the EMR cluster and run the following commands:

openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem

openssl x509 -text -noout -in certificate.pem

openssl pkcs12 -inkey key.pem -in certificate.pem -export -out certificate.p12

openssl pkcs12 -in certificate.p12 -noout -info

Please enter the public DNS name of the master node when asked for hostname. The above commands would create a file named: /home/hadoop/certificate.p12 This file is your certificate

 

2) Change the below properties in the zeppelin-site.xml file located at /etc/zeppelin/conf.dist/zeppelin-site.xml (If not present, copy the /etc/zeppelin/conf.dist/zeppelin-site.xml.template file and rename)

zeppelin.ssl

true

Should SSL be used by the servers?

zeppelin.ssl.keystore.path

/home/hadoop/certificate.p12

Path to keystore relative to Zeppelin configuration directory

zeppelin.ssl.keystore.type

PKCS12

The format of the given keystore (e.g. JKS or PKCS12)

zeppelin.ssl.keystore.password

password

Keystore password.

Can be obfuscated by the Jetty Password tool zeppelin.server.ssl.port 8445 Server ssl port. (used when ssl property is set to true) 3)

Restart Zeppelin :sudo stop zeppelin

sudo start zeppelin

4) You would be able to access Zeppelin over https on port 8445 : https://:8445/#/

User management Via Shiro

Now in order to manage groups/roles, you could create the groups/roles under the “[roles]” section in the “shiro.ini” file. For example, I could have a set of groups like:

 

    [roles]

    admin = *

    readonly = *

    poweruser = *

    scientist = *

    engineer = *

Then in the “[users]” sections, it could be looking like the below:

    [users]

    admin = password>, admin

    user1 = password>, scientist, poweruser

    user2 = password>, engineer, poweruser

    user3 = password>, readonly

 

 

For example, the above means that:

       

    – user “admin” is in “admin” group;

    – user “user1”  is in “poweruser” and “scientist” group

    – etc.

 

    Owners  admin

Writers scientist,engineer,poweruser

Readers readonly

 

Once the groups/roles are created, the authorization setting will be similar to what described in https://zeppelin.apache.org/docs/0.7.3/security/notebook_authorization.html . For instance, when in a notebook permission page, you can put the group name, instead of the individual users.

good read: recommendation from horton works:

https://community.hortonworks.com/articles/141589/zeppelin-best-practices.html

 

Need to learn more about aws big data (demystified)?

       

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Leave a Reply