Apache Shiro is part of the installation of EMR with the options below:
- Basic authentication (via Apache SHIRO): user management (user,pass,groups), even LDAP
https://zeppelin.apache.org/docs/0.7.3/security/shiroauthentication.html
- notebook permissions management: read/write/share
https://zeppelin.apache.org/docs/0.7.3/security/notebook_authorization.html
- Data source authorization (e.g 3rd party DB):
https://zeppelin.apache.org/docs/0.7.3/security/datasource_authorization.html
- Zeppelin with kerberos
- https://zeppelin.apache.org/docs/latest/interpreter/spark.html#setting-up-zeppelin-with-kerberos
Adding HTTPS/SSL to the EMR Zeppelin GUI (3 options)
- You can use a tunnel as used in EMR GUI websites (secured by default)
- Authentication and SSL via nginx
- you can add ELB on top of EMR , in 443, out 8890 for the zeppelin gui via HTTPS
Walkthrough to add https to Zeppelin:
1) Generating PKCS1 keystore file : Log into the master instance of the EMR cluster and run the following commands:
openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem
openssl x509 -text -noout -in certificate.pem
openssl pkcs12 -inkey key.pem -in certificate.pem -export -out certificate.p12
openssl pkcs12 -in certificate.p12 -noout -info
Please enter the public DNS name of the master node when asked for hostname. The above commands would create a file named: /home/hadoop/certificate.p12 This file is your certificate
2) Change the below properties in the zeppelin-site.xml file located at /etc/zeppelin/conf.dist/zeppelin-site.xml (If not present, copy the /etc/zeppelin/conf.dist/zeppelin-site.xml.template file and rename)
zeppelin.ssl
true
Should SSL be used by the servers?
zeppelin.ssl.keystore.path
/home/hadoop/certificate.p12
Path to keystore relative to Zeppelin configuration directory
zeppelin.ssl.keystore.type
PKCS12
The format of the given keystore (e.g. JKS or PKCS12)
zeppelin.ssl.keystore.password
password
Keystore password.
Can be obfuscated by the Jetty Password tool zeppelin.server.ssl.port 8445 Server ssl port. (used when ssl property is set to true) 3)
Restart Zeppelin :sudo stop zeppelin
sudo start zeppelin
4) You would be able to access Zeppelin over https on port 8445 : https://:8445/#/
User management Via Shiro
Now in order to manage groups/roles, you could create the groups/roles under the “[roles]” section in the “shiro.ini” file. For example, I could have a set of groups like:
[roles]
admin = *
readonly = *
poweruser = *
scientist = *
engineer = *
Then in the “[users]” sections, it could be looking like the below:
[users]
admin = password>, admin
user1 = password>, scientist, poweruser
user2 = password>, engineer, poweruser
user3 = password>, readonly
For example, the above means that:
– user “admin” is in “admin” group;
– user “user1” is in “poweruser” and “scientist” group
– etc.
Owners admin
Writers scientist,engineer,poweruser
Readers readonly
Once the groups/roles are created, the authorization setting will be similar to what described in https://zeppelin.apache.org/docs/0.7.3/security/notebook_authorization.html . For instance, when in a notebook permission page, you can put the group name, instead of the individual users.
good read: recommendation from horton works:
https://community.hortonworks.com/articles/141589/zeppelin-best-practices.html
Need to learn more about aws big data (demystified)?
- Contact me via linked in Omid Vahdaty
- website: https://amazon-aws-big-data-demystified.ninja/
- Join our meetup, FB group and youtube channel
- Join our meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
- Join our facebook group https://www.facebook.com/groups/amazon.aws.big.data.demystified/
- subscribe to our youtube channel https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me: