/ belodetek Ltd.

CloudFormation Generic Custom Resources

TL;DR generic Lambda to create Client VPN and Cognito IdP demo stacks 🤓

If you ever worked with AWS CloudFormation for any reasonable length of time, you would have discovered that is is a very powerfull framework. You would also quickly discover, that the native resources tend not to be on the bleeding edge of AWS development. Sometimes they lag so much, that the lag is unreasonably measured in years (e.g. 2012-2019). At the time of writing AWS Cognito and Client VPN are either only partially implemented or not implemented at all. Granted, Client VPN and AWS Backup have only been released in late 2018 and early 2019, but Cognito has been around sice 2014!

And yes, we can use frameworks of top of CloudFormation, such as Terraform, but this is a native CloudFormation solution, requiring no additional tools and frameworks. It can be deployed with awscli, without any further dependencies.

In any case, we are not going to start rolling our configurations manually, pressing buttons, etc. and will attempt to keep all of our AWS resource management within CFN. Why? Because DevOps.

Luckily for us, CFN has a concept of Custom Resources, which allows us to define our own resources and map them to Lambda functions. However, creating a separate function for every resource CFN doesn't currently (or ever will?) handle gets a bit tedious. For example, we have over 20 such functions already and I suspect others have a lot more.

Since CFN to Lambda bridge follows a very tight interface spec. and AWS provide an SDK for Python, among others, which tracks new AWS product development a lot better than CFN, woudn't it be nice to have a generic Lambda function, which would tke a bunch of parameters, invoke the appropriate SDK method to create, update or delete a resource and wait for the operation to complete? Right?

I found a few projects[1][2][3], which aim to address the same issue, but in the end decided to create my own version to implement features such as generic wait handling.

The overall approach I've taken is to implement Create, Update and Delete requests which are sent by CloudFormation to the Lambda function. Each request type takes the following mandatory parameters:

  • AgentService
  • AgentType
  • AgentCreateMethod

AgentType can be either client, resource or custom. Depending on the type, we create either a boto3.session.Session.client() or boto3.session.Session.resource() object. AgentService can be any of of supported services, such as ec2 and rds.

If type is custom, the provider will try to import the module named in AgentService (e.g. acm_pca.py with a class named ACM_PCA):

if agent_type == 'custom':
  from importlib import import_module
  agent_module = import_module(agent_service)
  agent = getattr(agent_module, agent_service.upper())(**kwargs)
  ...

Your custom module/class can then implement whatever you like and the provider can call it from CloudFormation. For a working example, please see acm-pca demo stack.

The following are optional:

  • AgentDeleteMethod
  • AgentUpdateMethod
  • AgentWaitMethod
  • Agent(Create|Update|Delete|Wait)Args
  • Agent(Create|Update|Delete)Exceptions
  • AgentResourceId
  • AgentWaitResourceId
  • AgentWaitQueryExpr
  • AgentResponseNode
  • AgentWait(Update|Create|Delete)QueryValues
  • AgentWait(Update|Create|Delete)Exceptions
  • AgentRegion
  • RoleArn

AgentResourceId is technically optional, but is normally required if the resource you are creating gets a unique id assigned by the API, which is in turn required on subsequent update and/or delete requests.

If RoleArn is present in ResourceProperties, we will assume that role first and pass the credentials to AgentService. If AgentRegion is present, we will assume the role in that region.

If AgentWaitMethod is supplied, we will try to instantiate it as a waiter first using the get_waiter() call and if that fails, we will try to create it literally.

If AgentWaitQueryExpr is supplied with a valid JSONpath, we will extract the match from the API response and return it as a PhysicalResourceId (string) to CFN response.

If AgentResponseNode is present, we will extract the supplied key from the API JSON response and place in in the CFN response Data key, which will allow the items to be queried with !GetAtt function in the template.

Both AgentWaitQueryExpr and AgentResponseNode achieve a similar purpose of pulling out the unique id from AWS API response. You probably want to use AgentResponseNode if there are multiple values in the API response you need to access in your templates using !GetAtt '${MyResource.MyValue}', otherwise if it is just a single value, use AgentWaitQueryExpr to place it into PhysicalResourceId and use !Ref <MyResource> to access it.

You can optionally supply a list of Agent(Create|Update|Delete)Exceptions to pass (e.g.):

[
    "agent.exceptions.InvalidResourceStateFault",
    "agent.exceptions.ClientError"
]

The expection will be evaluated at runtime, with agent resolving to either a boto3 client or resources object. If any of these exceptions are raised by the SDK, they will be passed. The same mechanism can be used for the waiters by supplying AgentWait(Update|Create|Delete)Exceptions. This approach is useful for executing instructions not returning a unique ID, for instance starting or stopping a DMS task in an unknown state.

⚠️☣️☢️ WARNING the update operation will almost certainly delete and re-create your resource, unless a valid update method is passed with correct parameters. Beware of using this tool on state-full resources, such as databases and directories. CFN offers a mechanism to prevent inadvertent deletion and updates, which should be used for these types of resources.

The most basic resource, is one that does not require waiting for it to complete creation or deletion. Just passing in the mandatory paramaters is enough, for example:

Resources:
  ClientVPNEndpoint:
    Type: 'Custom::ClientVPNEndpoint'
    Properties:
      ServiceToken: !Sub '${CustomResourceLambdaArn}'
      AgentService: ec2
      AgentType: client
      AgentCreateMethod: create_client_vpn_endpoint
      AgentDeleteMethod: delete_client_vpn_endpoint
      AgentResourceId: ClientVpnEndpointId
      AgentCreateArgs: <JSON object|JSON string>

AgentCreateArgs will take either a CFN formated JSON, with all values passed as strings, or a packed JSON-string, which the provider will unpack to preserve boolean values.

For a more complicated request, where resource creation takes time, we would normally want to wait for that resource to become available, before starting to create dependencies. In this instance, we would need to pass in some additional parameters. In the following example, we associate a subnet with a client VPN endpoint, which takes some time. Since there are no waiters implemented for this resource, we pass in a generic describe_client_vpn_target_networks method, a JSONpath to match in the response and successful create and delete values. For deletions, usually an empty list is enough, since when this resource is removed, the describe_client_vpn_target_networks response will be empty. We also pass in AgentWaitResourceId, since the wait method takes in a different parameter than the one specified in AgentResourceId.

  AssociateSubnet:
    Type: 'Custom::SubnetAssociation'
    Properties:
      ServiceToken: !Sub '${CustomResourceLambdaArn}'
      AgentService: ec2
      AgentType: client
      AgentCreateMethod: associate_client_vpn_target_network
      AgentDeleteMethod: disassociate_client_vpn_target_network
      AgentWaitMethod: describe_client_vpn_target_networks
      AgentWaitQueryExpr: '$.ClientVpnTargetNetworks[?(@.TargetNetworkId=="subnet-abcdef1234567890")].Status.Code'
      AgentWaitCreateQueryValues:
      - associated
      AgentWaitDeleteQueryValues: []
      AgentResourceId: AssociationId
      AgentWaitResourceId:
      - AssociationIds
      AgentCreateArgs:
        ClientVpnEndpointId: !Sub '${ClientVPNEndpoint}'
        SubnetId: subnet-abcdef1234567890
      AgentWaitArgs:
        ClientVpnEndpointId: !Sub '${ClientVPNEndpoint}'
      AgentDeleteArgs:
        ClientVpnEndpointId: !Sub '${ClientVPNEndpoint}'

There are many more examples, too numerous to list. This tool has not been tested on all possible combination and permutation of AWS resources, so there are almost definitely going to be edge-cases which would need to be handled correctly and hopefully generally.

A complete client-vpn-demo CFN stack is provided as a means to demonstrate the operation of this tool end-to-end. This stack will deploy a Client VPN endpoint, associate subnets, authorize ingress, add default routes and apply security group(s). This stack can be added as a nested stack within a parent template, to add client VPN (OpenVPN) connectivity to the private subnets.

Another (semi)complete cognito-demo CFN stack is provided. This stack deploys Cognito IdP resources and configures a user pool domain and SAML provider. This stack can be coupled with existing AWS ELBv2 (ALB) resource to provide authentication at the load balancer.

For AgentType == 'resource' usage example, take a look at the mock request and adapt to use inside your templates.

Note, if NoEcho is set to true under ResourceProperties, nothing will be printed into CloudWatch logs. However, if VERBOSE=1 environment variable is set, stack traces will be visible.

Every little bit helps, so you are welcome!

--belodetek 😬

Anton Belodedenko

Anton Belodedenko

I am a jack of all trades, master of none (DevOps). My wife and I ski, snowboard and rock climb. Oh, and I like Futurama, duh!

Read More