AWS EMR, cloudFormation

registering EMR master node as target to ALB via cloudFormation or CLI

you would like to register your EMR cluster’s master node as a target for the ALB.

Unfortunately, this is not natively possible in CFN.
Because the AWS::EMR::Cluster resource only returns the DNS name of the master node, there is no way to pass either the IP or the Instance Id of the master node to a AWS::ElasticLoadBalancingV2::TargetGroup resource using a !Ref or !GetAtt [1][2].

My guess, many have requested this feature. It has yet to be released. When this feature is released, you will be able to register the MasterNode by DNS name to the ALB without issue.

using bash and CLI to register emr master node as target to ALB

Even outside of CFN it can be a convoluted process to retrieve the Id of the Master Node.
To retrieve the Id manually you need to make a DescribeCluster API call on the cluster, then take the MasterPublicDnsName and use that as a filter on an ec2 DescribeInstances API call [4][5].
Please see an example of how I retrieved the Instance ID of a Master Node below:
dns_name=$(aws emr describe-cluster –cluster-id $clusterid | jq -r ‘.Cluster.MasterPublicDnsName’)
master=$(aws ec2 describe-instances –filters “Name=dns-name, Values=$dns_name” | jq -r ‘.Reservations[] | .Instances[] | .InstanceId’)
echo $master

using cloudFormation and Lambda to register emr master node as target to ALB

The only way to achieve this same kind of effect in CFN is to use a Custom Resource [6].
Custom Resources allow you to create a lambda-backed CFN resource that makes API calls on your behalf that are otherwise unavailable, or not possible, in CFN. I have provided an a good tutorial on how they work and how to create them [7].
In our case, the lambda function would need to make the above types of calls on the EMR cluster to retrieve the DNS name and then MasterNode InstanceId. That information would then need to be passed as parameters in a RegisterTargets API call to the TargetGroup created in the CFN template [8].
If you do decide to go the CFN custom resource route, I recommend including a deletion process in the function to handle removing the targets from the ALB upon deletion of the resource [9]. This clean up process will help avoid dependency errors when terminating the stack.

Basically if you are using python, you can:

get EMR master node DNS via:

import boto3
import datetime

def lambda_handler(event, context):
# TODO implement
client = boto3.client(’emr’)
response = client.describe_cluster(
ClusterId=’j-112345678′
)
response[‘Cluster’][‘MasterPublicDnsName’]
return response[‘Cluster’][‘MasterPublicDnsName’]

given the MasterPublicDnsName get the correlated instance id via:

import boto3

def lambda_handler(event, context):
# TODO implement
client = boto3.client(’emr’)
response = client.describe_cluster(
ClusterId=’j-YJ9Z2ZMU0DJM’
)
#this is the dns of master node in emr
masterNodeDns =response[‘Cluster’][‘MasterPublicDnsName’]

client2 = boto3.client(‘ec2’)
response2 = client2.describe_instances(
Filters=[
{
‘Name’: ‘dns-name’,
‘Values’: [
masterNodeDns
]
},
],
MaxResults=123
)
MasterNodeInstanceID = response2[‘Reservations’][0][‘Instances’][0][‘InstanceId’]

return MasterNodeInstanceID

register targets via:

    client3= boto3.client('elbv2')
    response3 = client3.register_targets(
    TargetGroupArn='arn:aws:elasticloadbalancing:eu-west-1:506754145427:targetgroup/zeppeling-stg-target/a4160357f4e8daff',
    Targets=[
        {
            'Id': MasterNodeInstanceID
        }
    ],
    )

and deregister

response = client.deregister_targets(
    TargetGroupArn='string',
    Targets=[
        {
            'Id': 'string',
            'Port': 123,
            'AvailabilityZone': 'string'
        },
    ]
)

 

So the full lambda will look like (notice the hardcoded stackname StgEMR):

import boto3

def lambda_handler(event, context):

#get the cluster ID given CloudFormation StackName
client4 = boto3.client(‘cloudformation’)
response4 = client4.describe_stack_resource(
StackName=’StgEMR’,
LogicalResourceId=’EMRCluster’
)
CloudFormationStackClusterID = response4[‘StackResourceDetail’][‘PhysicalResourceId’]

#get the EmrMasterNodeDns give cluster ID
client = boto3.client(’emr’)
response = client.describe_cluster(
ClusterId=CloudFormationStackClusterID
)
#this is the dns of master node in emr
masterNodeDns =response[‘Cluster’][‘MasterPublicDnsName’]

#get Instance ID of EMR master node given masterNodeDNS
client2 = boto3.client(‘ec2’)
response2 = client2.describe_instances(
Filters=[
{
‘Name’: ‘dns-name’,
‘Values’: [
masterNodeDns
]
},
],
MaxResults=123
)
MasterNodeInstanceID = response2[‘Reservations’][0][‘Instances’][0][‘InstanceId’]

#add instace to target of ALB given instance ID
client3= boto3.client(‘elbv2’)
response3 = client3.register_targets(
TargetGroupArn=’arn:aws:elasticloadbalancing:eu-west-1:506754145427:targetgroup/zeppeling-stg-target/a4160357f4e8daff’,
Targets=[
{
‘Id’: MasterNodeInstanceID
}
],
)
return “Done”

 

you could ran the lamda from cloudFormation as follows (recommend by cloudFormation support team).  However , I didn’t test it simply b/c it require launching the entire stack again and again until u get it right. i simply scheduled the lambda to run daily 15 min after the EMR was launched. It is a hack, but easier to get started.

lambda inside cloud formation should look like:

“mylambda”: {
“Type”: “AWS::Lambda::Function”,
“Properties”: {
“Handler”: “index.lambda_handler”,
“Role”: { “Fn::GetAtt” : [“LambdaExecutionRole”, “Arn”] },
“Code”: {
“S3Bucket”: “my-lambda-functions-bucket”,
“S3Key”: “mylambda.zip”
},
“Runtime”: “python3.6”,
“Timeout”: “100”,
“Environment”: {
“Variables”: {
“givenDns”: { “Fn::GetAtt”: [“EMRCluster”, “MasterPublicDNS”] }
}
}
}
},
“LambdaExecutionRole”: {
“Type”: “AWS::IAM::Role”,
“Properties”: {
“AssumeRolePolicyDocument”: {
“Version”: “2012-10-17”,
“Statement”: [{
“Effect”: “Allow”,
“Principal”: {“Service”: [“lambda.amazonaws.com”]},
“Action”: [“sts:AssumeRole”]
}]
},
“Path”: “/”,
“Policies”: [{
“PolicyName”: “root”,
“PolicyDocument”: {
“Version”: “2012-10-17”,
“Statement”: [{
“Effect”: “Allow”,
“Action”: [“logs:CreateLogGroup”,”logs:CreateLogStream”,”logs:PutLogEvents”],
“Resource”: “arn:aws:logs:*:*:*”
},
{
“Effect”: “Allow”,
“Action”: [“ec2:*”],
“Resource”: “*”
}]
}
}]
}
}

 

[1] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-emr-cluster.html#aws-resource-emr-cluster-returnvalues
[2] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-targetgroup.html#cfn-elasticloadbalancingv2-targetgroup-targettype
[3] https://aws.amazon.com/new/
[4] https://docs.aws.amazon.com/cli/latest/reference/emr/describe-cluster.html
[5] https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html
[6] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html
[7] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/walkthrough-custom-resources-lambda-lookup-amiids.html
[8] https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_RegisterTargets.html
[9] https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_DeregisterTargets.html

——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/

Leave a Reply