Creating continuous integration pipelines for static website generators using AWS

Hello Everyone!

Here at Elementryx, we believe in automating anything we might need to do twice. That means all code changes should automatically make their way to production. But that’s not sufficiently meta. We also need to automate the integration of the cloud resources necessary for our code pipelines.

This way, if we should experience an outage and we want to recover, or if we want to set up similar infrastructure for our customers, we simply need to execute our existing automation scripts.

In the spirit of being sufficiently meta, this is a blog post about our blog and how we used infrastructure as code techniques to automate the creation of new blog posts and the creation of the infrastructure necessary to run and serve our blog. All without using any of our own servers.

We will also provide the link to our Open Source CloudFormation script that you can use to set up your own static site generation pipelines.

The puzzle pieces

Jekyll

Jekyll is one of the more popular static site generators out there. In a Jekyll blog each blog post is added as a separate file in the _posts/ folder, allowing the Jekyll engine to combine all the blog posts together resulting in a cohesive blog. This means, every new blog post is a new git commit, and since we want our website to be updated with all our new content, we need Jekyll to rebuild our site whenever we post.

CodePipeline

CodePipeline is AWS’s continuous integration platform. It enables customers to link together steps necessary to see a change through from the code repository to production. For our purposes we need only understand Source steps and Build steps.

  • A Source step retrieves the source code for a website.

  • A Build step builds the code into a deployable artifact. In our case, the build step is responsible for building the source code into a static website using Jekyll.

CodeBuild

AWS CodeBuild is a managed service responsible for executing CodeBuild Projects, a process that typically includes:

  • Spinning up a transient build environment inside of a container.

  • Executing a build script.

  • Outputting one or more build artifacts.

Amazon S3

Serving a static website on AWS involves putting the static website inside a publicly accessible S3 bucket and configuring said S3 bucket accordingly. As we go on, we will show how to configure an S3 Bucket to serve a static website to the public.

Amazon CloudFront

CloudFront is a Content Distribution Network through which we cache our static website in edge locations to ensure that users can load our website as quickly as possible wherever they are in the world.

Amazon CloudFormation

Given our objective of automating our infrastructure, it should come as no surprise that Infrastructure as Code (IaC) is central to what we do here at Elementryx. As such, our tech stack consists largely of Terraform and CloudFormation templates that describe the configurations of cloud based resources. For this AWS specific infrastructure script, we debugged and modified a CloudFormation template that did not quite work for us in its existing form.

Amazon Route53

Our DNS service needs to route all web requests to our blog available here @ https://blog.elementryx.com to edge locations in our CDN. We create the necessary aliases to forward requests appropriately.

What happens when a new blog post is created

Every time a new blog post is created, the following steps take place. (We will go into more detail regarding how we automate these steps and show some code snippets later on.)

  1. A commit is made to our git repository.

  2. CodePipeline’s Source stage picks up this change, takes a snapshot of the source code, and forwards it to the Build stage

  3. CodeBuild picks up the change, spins up a docker container and executes the instructions in the buildspec.yml file found in the root directory of the source code to generate the static site from the jekyll project. The static site is then synchronized with S3.

  4. CloudFormation automatically fetches files from s3 to ensure that the updated blog posts are served quickly to customers wherever they are in the world.

Automating a continuous deployment pipeline in AWS for Jekyll on S3

Leveraging existing solutions

Because we would prefer not to build everything from scratch, we drew inspiration for our implementation from two primary sources:

The first of these sources is Eric Hammond at Alestic’s github repository entitled: aws-git-backed-static-website. This stack is a slightly different approach to describing a static website in code which we ran into some problems with:

  1. The approach uses lambda functions rather than AWS’s CodeBuild service to generate the site and push it to production. We found that one of the lambda functions provided in the template did not work. (See this github issue which seems to have since be resolved for more details.)

  2. We also found that the permissions for S3 buckets created in the stack did not seem to allow the contents to be served publicly.

  3. CodeBuild seemed to more naturally fit the process of building these sites than lambda functions. We will discuss why later on.

  4. We also wanted to allow users to connect their GitHub repositories with the Source step of our pipeline.

Because of these factors in combination, we elected to fork Eric Hammond’s repository, given the comprehensiveness of the overall design, and modify his CloudFormation template to ensure everything worked the way we wanted.

Aside from Eric Hammond’s repository, Alex Bilbie has a guide on his own Jekyll blog in which he explains in detail how to set up CodePipeline and CodeBuild to update his Jekyll blog. This is the approach we ended up following.

The end product

After making our custom changes we ended up with our own CloudFormation template with a similar structure to Erc Hammond’s but with the following differences:

  1. We replaced the lambda functions in the continuous integration pipeline with an AWS CodeBuild project that builds and copies the built code to S3.
  2. We added a bucket policy that allows all items in our website’s s3 bucket to serve all its contents publicly automatically.
  3. We extended the template to be used with GitHub.

The remainder of this blog post is devoted to describing the modifications we made and is very technical so feel free to skip ahead to using our template to our tutorial in the final section of this article or to view our source code or reach out if you want Elementryx to take care of all the technical stuff for you.

Describing and configuring our S3 Bucket to host our static website

The final generated static site are placed into an S3 bucket and so the only contents of this s3 bucket should be the public assets that will become available for the internet to see. In order to serve these assets to customers, the S3 bucket needs to have the appropriate configuration.

Consider the following snippet of CloudFormation configuration representing the S3 Bucket slightly adapted from Eric Hammond’s template:

  # Bucket for site content: example.com
  SiteBucket:
    Condition: NeedsNewSiteBucket
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Ref DomainName
      AccessControl: PublicRead
      WebsiteConfiguration:
        IndexDocument: index.html
        ErrorDocument: 404.html
      # logs.example.com/logs/s3/example.com/
      LoggingConfiguration:
        DestinationBucketName: !If [NeedsNewLogsBucket, !Ref LogsBucket, !Ref PreExistingLogsBucket]
        LogFilePrefix: !Sub "logs/s3/${DomainName}/"
    DeletionPolicy: Retain


Of particular interest is this line that reads AccessControl: PublicRead. This is known as a bucket ACL, and in theory should grant access to the public to retrieve the items in the bucket. And yet, with this description, trying to access our static website inside the S3 bucket returns a result of:

403 Forbidden
Code: AccessDenied
Message: Access Denied
RequestId: E4EE50D320218A19
HostId: n8VYM29rMgJw7aMpyHEH7sIy0SDvR1VY1LTd2Cwq3djStB/3RjWwIfSH3JcGHHAANN6iNN+PiuY=
An Error Occurred While Attempting to Retrieve a Custom Error Document
Code: AccessDenied
Message: Access Denied

Amazon’s documententation offers some insight as to why this is:

You can grant public read permission to your objects by using either a bucket policy or an object ACL. To make an object publicly readable using an ACL, grant READ permission to the AllUsers group, as shown in the following grant element. Add this grant element to the object ACL. For information about managing ACLs, see Managing Access with ACLs .

We defined a Bucket ACL, but we still need to grant permission to the objects inside our bucket. As such, we can implement a Bucket Policy.

As such, we added a bucket policy to the CloudFormation template:

  # bucket policy to provide read access to assets in SiteBucket
  SiteBucketPolicy:
    Type: "AWS::S3::BucketPolicy"
    Properties: 
      Bucket: !If [NeedsNewSiteBucket, !Ref SiteBucket, !Ref PreExistingSiteBucket]
      PolicyDocument: 
        Statement: 
          -
            Action: 
              - "s3:GetObject"
            Effect: "Allow"
            Resource: 
              - !Join ["", 
                [
                  "arn:aws:s3:::",
                  !If [
                    NeedsNewSiteBucket,
                    !Ref SiteBucket,
                    !Ref PreExistingSiteBucket
                  ],
                  "/*"
                ],
              ]
            Principal: "*"
						

This change allowed our website to be successfully served.

Important Note: We very strongly encourage you to only use a policy this permissive for your public assets that you don’t mind everyone seeing like your website. Do not place any sensitive material into your site bucket.

Replacing Lambda with CodeBuild

AWS Lambda is a “serverless” way of running a function. By abstracting the server from you, AWS Lambda executes your functions on AWS’s servers on demand and you pay per function invocation. Eric Hammond’s template utilizes Lambda function executions to build and synchronize the static site with S3.

One of the lambda functions in question did not work for us. Eric seems to have since fixed the issue. We did not want to write and maintain full lambda functions for the sole purpose of building our packages and AWS CodeBuild allows us to concisely describe how we wanted each static site to be built by placing a buildspec.yml file in the root of our static site project.

For a bunch of other benefits to using CodeBuild instead of Lambda as part of your continuous integration pipelines take a look at Soenke Ruempler’s thorough blog post entitled “AWS CodeBuild: The missing link for deployment pipelines in AWS”.

Our Jekyll blog, for instance, has a buildspec.yml which looks like this:

version: 0.1
   
phases:
  install:
    commands:
      - gem install jekyll jekyll-watch jekyll-paginate jekyll-sitemap jekyll-gist jekyll-sass-converter
  build:
    commands:
      - echo "******** Building blog.elementryx.com ********"
      - jekyll build
      - echo "******** Updating blog.elementryx.com s3 bucket: ********"
      - echo "$TARGET_S3_BUCKET"
      - aws s3 sync _site/ $TARGET_S3_BUCKET

Of course, in order to execute the initial gem command in the install phase of the build, AWS needs to provision an environment with ruby installed. When we configure our CodeBuild project, we therefore specify that we want to run our build process in a predefined image. AWS has a ton of images to choose from. Better yet, if you can’t find a managed image to meet your requirements, you can use any custom docker instead.

In our case, we used an AWS image running ruby: aws/codebuild/ruby:2.3.1

Because we didn’t want to lock you into using Jekyll, we let you pass the image name as a parameter into our CloudFormation template but we leave the default value to be our ruby environment.

  BuildEnvironmentImage:
    Description: > 
		    Optional image to use for codebuild eg. docker:dind.
		    Defaults to a ruby environment suitable for Jekyll builds.
    Type: String
    Default: "aws/codebuild/ruby:2.3.1"

You can use your own buildspec.yml per your own custom requirements. We inject your website’s S3 bucket name as an environment variable in our cloudformation template so you can access it in your code.

In our repository we have included sample buildspec.yml files for Jekyll and Hugo projects, and for static websites which don’t need to be transformed.

As another note, the end of your buildspec.yml should use the aws s3 sync command to synchronize your built directory with your website’s s3 bucket. (The AWS SDK comes pre-installed with AWS’s managed build images but other docker images used for more complex builds should also have AWS’s SDK included). If instead you try to upload an output artifact back to the CodePipeline, the index.html file will not exist in the root of your S3 bucket. The uploaded artifacts will instead be zipped together before being placed into the s3 bucket. Your website will therefore not work!

Extending our template for use with GitHub

We’ve had mixed experiences with AWS CodeCommit, and we want to be able to open-source our code. As such, we wanted to allow users to connect their github repositories to the code pipeline created in this CloudFormation template.

Here’s a snippet from our CloudFormation template that illustrates how we implemented this Source Action:

            - Name: SourceAction
              ActionTypeId:
                Category: Source
                Owner: !If [IsCodeCommit, "AWS", "ThirdParty"]
                Provider: !Ref SourceType
                Version: 1
              Configuration:
                !If
                - IsCodeCommit
                - 
                  RepositoryName: !If [NeedsNewGitRepository, !Ref DomainName, !Ref PreExistingGitRepository]
                  BranchName: !Ref BranchName
                -
                  Owner: !Ref GithubRepoOwner
                  Repo: !Ref GithubRepoName
                  Branch: !Ref BranchName
                  OAuthToken: !Ref GithubOauthToken
              OutputArtifacts:
                - Name: SiteSource
              RunOrder: 1

As you can see, we simply swap out the Owner and Configuration fields depending on the IsCodeCommit condition which tells us if the user has selected CodeCommit or GitHub.

Tutorial

I’m going to repurpose this section of the blog as the README.md of our repository.

Goal

At the end of this short tutorial, you will have launched a fork of the popular YAX-Coming-soon-Jekyll-Template to which we’ve added a buildspec.yml. This is a static site that will look like this and will be built from your GitHub account using Jekyll, uploaded to S3, and served globally over CloudFront.

To get started you will need:

  1. An AWS account with some pretty comprehensive privileges. (This CloudFormation template even creates some IAM roles so make sure you’re running this template with sufficient privileges.)
  2. A GitHub account and credentials.
  3. A domain name. Ideally this domain name should be managed by Amazon’s Route 53, but it doesn’t need to be.

    Note: If your DNS is not managed by AWS, you will need to point a CNAME to or Alias record for your CloudFront Distribution

Setting up your first static site generator pipeline

Step 1: Fork our website source repository on GitHub

Step 2: Follow the steps provided by GitHub to create a OAuth token from GitHub and make sure to include the scopes repo and admin:repo_hook . Copy this OAuth token.

Step 3: Open this this wizard to enable you to launch your stack. Click next.

Step 4: Set the following fields:

DomainName = "your.blogdomain.com"
SourceType = "GitHub"
BranchName = "master"
GithubRepoOwner = "yourgithubname"
GithubRepoName = "codepipeline-powered-static-website"
GithubOauthToken ="yourPastedOauthTokenFromStep2"
NotificationEmail = "your@email-address.com"

If your domain is managed by AWS and you have an existing hosted zone, you should enter your pre-existing hosted zone.

PreExistingHostedZoneDomain = "your.blogdomain.com"

Step 5: Click Next > Click Next > check the box that reads I acknowledge that AWS CloudFormation might create IAM resources. > Click Create

Step 6: When you receive emails from AWS requesting confirmation that you own the domain when CloudFormation is creating your SSL certificate, confirm that you want to allow AWS to create the certificate.

Step 7: Wait, for about an hour, the cloudfront distributions will be ready to serve your website.

Enjoy your new website pipeline!