blog cover image

TLDR;

  • AWS Serverless Application Model (SAM) is used to quickly create Serverless applications with support for;
    • local development that emulates AWS Lambda, and API Gateway via Docker
    • takes care of blue/green deployments via AWS CodeDeploy
    • infrastructure as code, repeatedly deploy the same infrastructure on multiple environments (dev, test, production).
  • Terraform is a Cloud agnostic Infrastructure as Code language and tooling.
    • takes care of non-serverless, permissions, the “difficult stuff”.

The result is a great local development experience via AWS SAM combined with the power of Terraform as the glue and as a general purpose solution for other AWS service deployments.

Chapters

Introduction

The creation of this combination came out of necessity working together with developers that are not cloud native developers. Our initial solution was to emulate the online environment by using command line tooling that would execute a proxy which then executes the Lambda code. This resulted in some big problems down the line (hint, hard to debug!).

The solution here is to avoid self created offline development environments, AWS SAM is directly supported and developed by the good people of Amazon themselves. Hence, we’ve started using AWS SAM because of its use of Docker containers to emulate AWS services and it provides the means to easily create the relevant Service events (e.g. events for SNS, API Gateway, etc.) for AWS Lambda local testing.

In this article I’ll discuss the background and the solution I created to successfully integrate Terraform deployments with AWS SAM.

There are two repositories on Github that demonstrate my solution, please feel free to review these here:

  1. Terraform code: https://github.com/rpstreef/terraform-aws-sam-integration-example

  2. AWS SAM and NodeJS code: https://github.com/rpstreef/aws-sam-node-example

Overview

First off, let’s start with an overview of the solution and where the responsibilities have been defined for both Terraform and AWS SAM.

The idea behind this setup is to take advantage of a couple of key features offered by AWS SAM;

  • Local development via Docker images.
  • Various Blue/Green Deployment scenarios supported via AWS CodeDeploy.

Then have the rest of the application be defined by Terraform that can do this reliably and in a DRY (Don’t Repeat Yourself) structure.

With that said, AWS SAM only defines the API definition (API Gateway), the AWS Lambda functions, and Lambda CloudWatch Alarms for CodeDeploy.

overview-terraform-aws-sam

Terraform

Terraform is an all purpose infrastructure as code tool suite that can create infrastructure on a number of Cloud vendors. With its big support base, and the very useful modules system, it’s relatively straightforward to deploy large infrastructures to AWS.

With that said, what does Terraform deploy in this particular configuration with AWS SAM?

Besides the complete AWS CodePipeline, as shown in the diagram above, it manages all “fringe” AWS services in this code example. From left to right:

  • AWS Cognito for identity management and authentication. Provides API Gateway security through Header authentication.
  • AWS CloudWatch for monitoring with Alarms. Sets Alarms for API Gateway endpoints (latency, P95, P99 and 400/500 errors) and AWS Lambda (errors, timeouts etc.)
  • AWS IAM roles and permissions management. Allows execution of AWS Lambda by an API Gateway endpoint and sets general Role permission policies for AWS Lambda execution.

For your own projects this would mean; services like SQS queues, Databases, S3 buckets etc. would all be created by Terraform and referenced in the AWS SAM template.

AWS CodePipeline

To deploy AWS SAM, we use AWS CodePipeline to setup our automated CI/CD pipeline. It consists of the following stages:

  • Source: Retrieves the repository data from GitHub.
  • Build: Builds the solution based on a build script, buildspec.yaml, and the repository data.
  • Deploy: Deploys the build stage output.

Source

Each stage can result in artifacts that need to cary over to the next stage in the pipeline, that’s where the artifact_store configuration comes in.

resource "aws_s3_bucket" "artifact_store" {
  bucket        = "${local.resource_name}-codepipeline-artifacts-${random_string.postfix.result}"
  acl           = "private"
  force_destroy = true

  lifecycle_rule {
    enabled = true

    expiration {
      days = 5
    }
  }
}

To get the source, the GitHub credentials (OAuthToken), and connection information (Owner, Repo, and Branch) needs to be provided.

PollForSourceChanges if set to true, will start the pipeline on every push to the configured Branch.

resource "aws_codepipeline" "_" {
  
  # Misc configuration here

  artifact_store {
    location = aws_s3_bucket.artifact_store.bucket
    type     = "S3"
  }

  stage {
    name = "Source"

    action {
      name             = "Source"
      category         = "Source"
      owner            = "ThirdParty"
      provider         = "GitHub"
      version          = "1"
      output_artifacts = ["source"]

      configuration = {
        OAuthToken           = var.github_token
        Owner                = var.github_owner
        Repo                 = var.github_repo
        Branch               = var.github_branch
        PollForSourceChanges = var.poll_source_changes
      }
    }
  }
  
  # Build stage here

  # Deploy stage here

  lifecycle {
    ignore_changes = [stage[0].action[0].configuration]
  }
}

The lifecycle ignore_changes configuration is applied because of a bug with the OAuthToken parameter. See this issue for more information.

Build

The build stage is pretty self explanatory.

stage {
  name = "Build"

  action {
    name             = "Build"
    category         = "Build"
    owner            = "AWS"
    provider         = "CodeBuild"
    version          = "1"
    input_artifacts  = ["source"]
    output_artifacts = ["build"]

    configuration = {
      ProjectName = aws_codebuild_project._.name
    }
  }
}

What’s more interesting is the buildspec file which defines the steps to produce our artifacts. The configuration.json file contains all the properties that are defined in the AWS SAM template. That way we can configure this run for each environment (dev, test, prod) individually with different settings.

version: 0.2
phases:
  install:
    runtime-versions:
      nodejs: 12
    commands:
      - pip3 install --upgrade aws-sam-cli
      - cd dependencies/nodejs
      # Do not install dev dependencies
      - npm install --only=production
      - cd ../../
  build:
    commands:
      - sam build
  post_build:
    commands:
      - sam package --s3-bucket $ARTIFACT_BUCKET --output-template-file packaged.yaml 
artifacts:
  files:
    - packaged.yaml
    - configuration.json

Deploy

Here we get to the meat and potatoes of the pipeline, the deployment stage.

There are two steps it needs to undergo:

  1. CloudFormation change set:

Here we create the CloudFormation Change set on an existing stack or a new stack, which means it will “calculate” the difference and applies the changes necessary to the related services in your AWS account.

For this we need an IAM Role with the appropriate permissions (role_arn or RoleArn, someone at Terraform will eventually find out which one ;) )

The template is a result from our build process, hence build::packaged.yaml. Then the build::configuration.json file is in the Github repo that contains the relevant parameters for deployment of our stack.

Please note there’s an inconsistency in how Terraform deploys action CreateChangeSet. By just setting the role_arn in the action block, the RoleARN in the configuration block gets set to null. You need to set the RoleARN parameter to avoid this. I’ve reported this issue to the official GitHub repo.

stage {
  name = "Deploy"

  action {
    name            = "CreateChangeSet"
    category        = "Deploy"
    owner           = "AWS"
    provider        = "CloudFormation"
    input_artifacts = ["build"]
    role_arn        = module.iam_cloudformation.role_arn
    version         = 1
    run_order       = 1

    configuration = {
      ActionMode            = "CHANGE_SET_REPLACE"
      Capabilities          = "CAPABILITY_IAM,CAPABILITY_AUTO_EXPAND"
      OutputFileName        = "ChangeSetOutput.json"
      RoleArn               = module.iam_cloudformation.role_arn
      StackName             = var.stack_name
      TemplatePath          = "build::packaged.yaml"
      ChangeSetName         = "${var.stack_name}-deploy"
      TemplateConfiguration = "build::configuration.json"
    }
  }
  1. CloudFormation change set execute:

In the second action step we actually execute the change set made previously

  action {
    name            = "Deploy"
    category        = "Deploy"
    owner           = "AWS"
    provider        = "CloudFormation"
    input_artifacts = ["build"]
    version         = 1
    run_order       = 2

    configuration = {
      ActionMode     = "CHANGE_SET_EXECUTE"
      Capabilities   = "CAPABILITY_IAM,CAPABILITY_AUTO_EXPAND"
      OutputFileName = "ChangeSetExecuteOutput.json"
      StackName      = var.stack_name
      ChangeSetName  = "${var.stack_name}-deploy"
    }
  }
}

If you want to find out which configuration block parameters are required in defining CodePipeline stages and actions, look at this official documentation page for CodePipeline, and in this case the CloudFormation action reference in particular.

AWS CodePipeline Terraform module

To run your own AWS CodePipeline for your AWS SAM integration, I’ve made this Terraform module available here:

https://registry.terraform.io/modules/rpstreef/codepipeline-sam/aws/1.0.0

AWS CloudWatch

With CloudWatch the main purpose is monitoring, so I’ve created monitoring for;

  • API Gateway endpoints:
    • Latency P95: 95th percentile latency, which represents typical customer experienced latency figures.
    • Latency P99: 99th percentile latency, represents the worst case latency that customers experience.
    • 400 Errors: HTTP 400 errors reported by the endpoint.
    • 500 Errors: HTTP 500 internal server errors reported by the endpoint.
  • Lambda functions:
    • Error rate: Alarms on errors with a default of threshold of 1 percent during a 5 minute measurement period
    • Throttle count: Alarm on throttle count of 1 within 1 minute measurement period
    • Iterator age: Alarm for Stream based invocations such as Kinesis, alerts you when the time to execute is over 1 minute within a 5 minute measurement period. Check for more details here.
    • Deadletter queue: Alarm for DLQueue messages (for async Lambda invocations or SQS queues for example), 1 message within 1 minute triggers the alarm.

AWS CloudWatch Terraform module

If you want to create CloudWatch Alarms for API Gateway endpoints and/or AWS Lambda, you can use my Terraform module to quickly set them up:

https://registry.terraform.io/modules/rpstreef/cloudwatch-alarms/aws/1.0.0

AWS IAM

To setup the required policies and permissions, the following two things are done.

  1. Create Roles with Permissions policies that allow a service to use other services.

This creates a role based off of policy documents like shown below. Then this ARN gets applied in the AWS SAM template and that function can subsequently publish a message to an SNS topic for instance.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cognito-idp:SignUp",
        "cognito-idp:AdminInitiateAuth",
        "cognito-idp:ListUsers",
        "cognito-idp:AdminConfirmSignUp"
      ],
      "Resource": "${cognito_user_pool_arn}"
    },
    {
      "Effect": "Allow",
      "Action": [
        "lambda:GetLayerVersion"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sns:Publish"
      ],
      "Resource": "${sns_topic_arn}"
    },
    {
      "Effect": "Allow",
      "Action": [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords",
        "xray:GetSamplingRules",
        "xray:GetSamplingTargets",
        "xray:GetSamplingStatisticSummaries"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

We store this json and then call it with my module like show here.

module "iam" {
  source = "github.com/rpstreef/tf-iam?ref=v1.1"

  namespace         = var.namespace
  region            = var.region
  resource_tag_name = var.resource_tag_name

  assume_role_policy = file("${path.module}/policies/lambda-assume-role.json")
  template           = file("${path.module}/policies/lambda.json")
  role_name          = "${local.lambda_function_user_name}-role"
  policy_name        = "${local.lambda_function_user_name}-policy"

  role_vars = {
    cognito_user_pool_arn = var.cognito_user_pool_arn
    sns_topic_arn         = module.sns.sns_topic_arn_lambda
  }
}

The role_vars get filled in the json template automatically upon inclusion via the data "template_file" Terraform function called within that IAM module. See the official documentation on this functionality here.

  1. Create a permission for one service to allow execution of another service.

To allow your API Gateway to execute an integrated AWS Lambda function, you have to give it permission like so:

resource "aws_lambda_permission" "_" {
  principal     = "apigateway.amazonaws.com"
  action        = "lambda:InvokeFunction"
  function_name = var.lambda_function_identity_arn

  source_arn = "arn:aws:execute-api:${
    var.region
    }:${
    data.aws_caller_identity._.account_id
    }:${
    var.api_gateway_rest_api_id
  }/*/*"
}

This lambda:InvokeFunction action tells the principle apigateway.amazonaws.com that the source_arn is allowed to execute the function_name. You can apply this similarly for the SNS service (sns.amazonaws.com) or any other service that can integrate with AWS Lambda.

The actual integration of the AWS Lambda with the endpoint is defined in the OpenAPI document that is included in the AWS SAM repository. With these two together, you have a functioning integration.

AWS SAM

Alright so we know what the Terraform parts do, how about AWS SAM?

Check my example repository here if you’d like to go in-depth.

Let’s see that diagram again.

overview-terraform-aws-sam

When the AWS CodePipeline is finished it will deploy, from left to right:

  • AWS API Gateway: Using the OpenAPI document (api.yaml in my example repo) to specify the integration, Cognito Security, and our endpoints.
  • AWS Lambda: Creates just the functions with their environment variables.
  • AWS CloudWatch deploy Alarms: To make sure we track the right version of the deployed Lambda function, we create and update the Alarms everytime in this template.

Then there are a few artifacts in this repo that control the integration with Terraform and define the AWS SAM stack.

  • template.yaml: The most important file, the AWS SAM template that determines the AWS services that get deployed and how they’re configured. More details on the anatomy of the AWS SAM template here.
  • configuration.json: This file is a CloudFormation template that specifies the parameters you can see in the template.yaml file at the top. That way each environment can have a different configurations with the same deployment of AWS services.
  • api.yaml: The OpenAPI document describing our API Gateway endpoints and the JSON Schema for the input and output models used for each of those endpoints. Details on the workings of this I have discussed in a previous article here.

AWS SAM Template

As you can tell from the official documents on the AWS SAM template anatomy, it resembles CloudFormation for the most part and it introduces a few special Resources to more quickly create serverless applications.

Rather than going over every bit of detail, I’ll focus on two features that make creating templates much less painful. After which I’ll discuss how to get local development up and running and show a bit of the CodeDeploy goodness that comes along with AWS SAM applications.

1. Globals

With Globals you have the ability to apply configuration across all individually defined resources at once. For example:

Globals:
  Function:
    Runtime: nodejs12.x
    Tags:
      Environment:
        Ref: Environment
      Name:
        Ref: AppName

Resources:
  IdentityFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName:
        Fn::Sub: ${Environment}-${AppName}-identity

  UserFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName:
        Fn::Sub: ${Environment}-${AppName}-user

This means that each function declared under Resources will use the Runtime NodeJS version 12 and they will get the same Tags applied. This saves a lot of space configuring the same properties for each individual function.

2. OpenAPI

With the OpenAPI document, certain definitions in AWS SAM template are unnecessary, therefore they can be omitted. For example:

Resources:
  IdentityFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: identity/app.lambdaHandler
      Role:
        Ref: IdentityRoleARN
      Events:
        identityAuthenticate:
          Type: Api
          Properties:
            Path: /identity/authenticate
            Method: post
            RestApiId:
              Ref: Api
        identityRegister:
          Type: Api
          Properties:
            Path: /identity/register
            Method: post
            RestApiId:
              Ref: Api
        identityReset:
          Type: Api
          Properties:
            Path: /identity/reset
            Method: post
            RestApiId:
              Ref: Api
        identityVerify:
          Type: Api
          Properties:
            Path: /identity/verify
            Method: post
            RestApiId:
              Ref: Api

The Events property does two things;

  1. It allows API Gateway to execute this function,
  2. and it configures the integration for these endpoints with this Lambda function.

The first, we solve by setting the permissions in our Terraform configuration as demonstrated in this chapter. The second we solve by setting the integration configuration using this AWS OpenAPI extension, x-amazon-apigateway-integration, like so:

paths:
 /identity/authenticate:
    post:
      operationId: identityAuthenticate
      description: Authenticate user (either login, or continue session)
      x-amazon-apigateway-integration:
        uri:
          Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${IdentityFunction.Arn}/invocations
        passthroughBehavior: "when_no_match"
        httpMethod: "POST"
        timeoutInMillis:
          Ref: APITimeout
        type: "aws_proxy"
  # omitted propertes for brevity

To make this interpolation work, you’ll have to include this document in the Api resource like this:

Resources:
  Api:
    Type: AWS::Serverless::Api
    Properties:
      DefinitionBody:
        Fn::Transform:
          Name: AWS::Include
          Parameters:
            Location: api.yaml
      # omitted propertes for brevity

This has an added benefit that the OpenAPI document is kept separate from your AWS SAM template, so you can use the OpenAPI document for instance to generate client and/or server side code without modification.

The idea is to make the AWS SAM use as minimal as possible, and have everything else done by either the OpenAPI specification or my Terraform modules.

Local testing

To test run locally, a few things need to be in place:

  1. Docker installation: Make sure you have Docker installed or Docker Desktop when you’re on Windows.
  2. Installed dependencies:, the NodeJS dependencies need to be installed, npm install, before the related code can be executed.
  3. Code editor with Debugging capabilities:, to link the debugging port with your code editor view. I use Visual Code which has support for this integrated, like most editors.
  4. Environment variables: The last part of emulating the online environment, duplicate your variables in the env.json file categorized by function name. Also note that when running in a local environment, the AWS_SAM_LOCAL environment variable is set to true.

Now start the local development api with sam local start-api -d 5858 -n env.json or via the npm run command in my repo npm run start-api.

The following will appear indicating the api server is ready:

Mounting UserFunction at http://127.0.0.1:3000/user [GET, POST]
Mounting IdentityFunction at http://127.0.0.1:3000/identity/reset [POST]
Mounting IdentityFunction at http://127.0.0.1:3000/identity/register [POST]
Mounting IdentityFunction at http://127.0.0.1:3000/identity/authenticate [POST]
Mounting IdentityFunction at http://127.0.0.1:3000/identity/verify [POST]

Debugging is enabled on port 5858, you’ll see this notice in the command line:

Debugger listening on ws://0.0.0.0:5858/275235c0-635f-4e27-b412-d4137d4e1c64

Now I can set breakpoints in my code editor and attach the debugging tooling to the debugging port 5858. This should stop execution at that breakpoint, allowing you to live view variables and the call stack.

overview-terraform-aws-sam

Creating events

Now that we can actually start a local api, the next thing is to actually replicate the events you can receive from other other AWS services such as DynamoDB, S3, SNS etc.

To view the supported list of events, execute sam local generate-event. Then we can choose to emulate SNS like this:

sam local generate-event sns notification

To customize it, add the appropriate flags that wish to have in the event. To list those flags:

sam local generate-event sns notification --help

or just manually edit the json files.

Once we have these events, we can do a local test run by invoking the lambda directly with the event we want to test:

sam local invoke IdentityFunction -e ./events/sns-notification.json -d 5858 -n env.json

AWS CodeDeploy

The last piece of functionality that makes AWS SAM such great tooling to use, is the CodeDeploy functionality. Next to support for ECS, and EC2, it also supports deployment to Lambda functions. With this service you basically get blue/green deployment out of the box for free without having to set anything up at all.

How this works is, once the CloudFormation stack (changes) are getting applied, it will use CodeDeploy for the Lambda function parts which looks like this:

overview-terraform-aws-sam

Depending on the deployment strategy chosen in your AWS SAM template (under property DeploymentPreference), for instance:

Resources:
  IdentityFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName:
        Fn::Sub: ${Environment}-${AppName}-identity
      Description: Provides all Cognito functions through 4 API Rest calls
      DeploymentPreference:
        Type: LambdaCanary10Percent5Minutes

It will automatically do traffic shifting from your old version to the new Lambda version code. In this example, it will shift 10 percent to the new version and 90% to your old version. After 5 minutes, 100% of your new code is deployed.

See this official article for all the available configurations.

To manage potential errors and rollbacks, this is where the CloudWatch Alarms come in that were discussed in the introduction of this chapter.

Resources:
  IdentityCanaryErrorsAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: 
        Fn::Sub: ${Environment}-${AppName}-identity-canary-alarm
      AlarmDescription: Identity Lambda function canary errors
      ComparisonOperator: GreaterThanThreshold
      EvaluationPeriods: 2
      MetricName: Errors
      Namespace: AWS/Lambda
      Period: 60
      Statistic: Sum
      Threshold: 0
      Dimensions:
        - Name: Resource
          Value:
            Fn::Sub: ${IdentityFunction}:live
        - Name: FunctionName
          Value: 
            Ref: IdentityFunction
        - Name: ExecutedVersion
          Value:
            Fn::GetAtt:
            - IdentityFunction
            - Version
            - Version

Whenever a deployment is done with CodeDeploy, it will update the CloudWatch alarm to monitor the latest version for any errors within the EvaluationPeriods specified. If the Threshold is breached, it will do a rollback to the old version.

Known issues

Not all is perfect yet with AWS SAM, I’ve logged some of the errors and issues I came across for your convenience.

1. I’m not seeing my code changes when running my api with sam local start-api

Unlike what the CLI tool suggests, You only need to restart SAM CLI if you update your AWS SAM template, this is actually not true.

You’ll need to run sam build after every code change. Hopefully this will get fixed in later versions of the tool. Links to related issues on Github;

The work around suggested is to use nodemon and then run nodemon --exec sam build next to the command sam local start-api -d 5858 -n env.json. This way any code changes trigger the sam build and you can keep the local api running in parallel.

2. ‘CreateFile’, ‘The system cannot find the file specified.’

Makes sure your Docker environment is running before you run any local invocations.

3. Runtime.ImportModuleError

To run AWS SAM locally, make sure you run npm install in your dependencies folder. For my example, the dependencies are located in ./dependencies/nodejs

4. No deployment configuration found for name: LambdaAllAtOnce

Somehow it does not find this configuration even though it is listed. To solve it, use the AllAtOnce as DeploymentPreference which also seems to work for Lambda next to EC2.

5. AWS SAM creates a stage called “Stage” by default.

The solution here as suggest by this issue, is to add the OpenAPI version to the globals section of your AWS SAM template like this.

Globals:
  Api:
    OpenApiVersion: 3.0.1

Conclusion

The inclusion of AWS SAM in our local development workflow has made things significantly more efficient and the ability to use Terraform for almost everything else keeps most of what we already did the same. The best of both worlds!

There are some definitive improvements to be made in this setup I’m sure. My first thoughts are using AWS Systems Manager Parameter store for synching parameters between Terraform and AWS SAM. For now however, the manual synching works and only needs to be done at setup time.

If you have any other idea’s or improvements, let me know in the comments below.

Thanks for reading and leave a heart when you enjoyed or appreciated the article.

Till next time!