Hosting applications in Nomad is easy, all it takes is writing a small HCL file and all your apps are now magically running on your Nomad cluster with Infrastructure as Code. Hosting your HTTP load balancer on Nomad, acting as a reverse proxy to your HTTP services, is another story: if I want to run my server over HTTPS, how do I generate SSL certificates within Nomad?

To address this need, you could take the easy path: routing ingress traffic to your HTTP load balancer running in Nomad with Cloudflare Tunnels. Cloudflare would handle SSL termination for you, so that you wouldn’t have to deal with SSL certificates.

However, not all people want to rely on Cloudflare to handle this task. The alternative would be to handle SSL termination on your own NGINX server, and acquire the SSL certificates with Certbot (via Let’s Encrypt). In a traditional Linux system setup, whenever your SSL certificate would get renewed, Certbot would signal NGINX to reload itself. In a Nomad cluster, that’d play out differently.

I have came up with an elegant method to manage SSL certificates in a Nomad-native way with Certbot, leveraging Nomad templates, which I’ll explain in details in this article. This method makes it possible to provide SSL certificates to any service running in Nomad, from a NGINX load balancer to a Postfix mail server. It also lets you reload Nomad jobs using your SSL certificates whenever they get renewed.


1. The big picture

On a traditional Linux system, you’d issue certificates using the certbot command, and then use a cron job in order to trigger renewal checks. In a Nomad environment, we will still use the same certbot command, although it will be packaged in a Docker image. Renewals will be handled by a Nomad batch job — an alias name for a Cronjob, which will be scheduled to run periodically (for example, every week). Nomad will handle the responsibility of firing the job whenever necessary.

Your SSL key and certificate PEMs will be stored in Nomad variables, which can then be used in whichever job uses SSL certificates, such as your NGINX load balancer job. Nomad variables come with the benefit of being able to trigger templates re-rendering whenever a variable change, meaning that Nomad will know when to restart your NGINX server so that it loads your new SSL certificate. Instead of restarting NGINX, we might rather want to reload it with a SIGHUP signal, which will make certificate updates a zero-downtime operation.

Finally, Nomad variables can be updated via Nomad’s HTTP API, which is authenticated with a private token that we will generate. We are looking to use Nomad’s HTTP API to submit the new SSL key and certificate whenever it changes, that is, whenever it gets renewed by our certbot job. We will be using Certbot renewal hooks to execute a script whenever our Nomad variables need to be updated with the new SSL key and certificates.


2. Defining the Certbot cron job

Let’s start by creating our certbot periodic job!

First, we need to create 2 persistent volumes:

  • One where Certbot will store its database (we will call it data-certs)
  • Another where Certbot will store the temporary domain verification files served over HTTP, in the .well-known/ directory (we will call it data-web).

Note that since my Nomad setup is 1 server + 1 client, I am using host volumes. If your Nomad setup is distributed in a cluster, you would rather be looking to create volumes using CSI, which will be network volumes such as a Block Storage provided by your cloud provider. This aspect is not covered in this article, you may adapt the configurations provided in this article to match your Nomad topology.

If you need to create those two volumes on your Nomad host, edit your Nomad client agent nomad.hcl configuration and add eg. the following:

client {
  host_volume "data-web" {
    path = "/srv/nomad/volumes/web"
  }

  host_volume "data-certs" {
    path = "/srv/nomad/volumes/certs"
  }
}

Before we continue, we will also need to create a policy and token to be able to hit the Nomad API when our SSL key and certificate gets renewed.

Create a policy file named certs-updater.policy.hcl:

namespace "default" {
  variables {
    path "certs/*" {
      capabilities = ["write"]
    }
  }
}

Now, import this policy:

nomad acl policy apply certs-updater ./certs-updater.policy.hcl

And create a Nomad token:

nomad acl token create -name "LetsEncrypt (job-certbot)" -policy certs-updater

Take note of the generated token, and insert it in a file named job-certbot.nv.hcl:

path = "nomad/jobs/job-certbot"

items {
  NOMAD_TOKEN = "YOUR_CERTS_UPDATER_TOKEN_HERE"
}

And import it to Nomad:

nomad var put -in hcl @job-certbot.nv.hcl

Good! Now we can define the actual certbot job.

Create a job file named job-certbot.hcl:

job "job-certbot" {
  type = "batch"

  periodic {
    crons     = ["0 12 */3 * *"]
    time_zone = "Europe/Paris"

    prohibit_overlap = true
  }

  reschedule {
    attempts = 0
  }

  group "job-certbot" {
    restart {
      attempts = 0
      mode     = "fail"
    }

    volume "certbot-data" {
      type   = "host"
      source = "data-certs"
    }

    volume "web-data" {
      type   = "host"
      source = "data-web"
    }

    task "job-certbot" {
      driver = "docker"

      config {
        image = "valeriansaliou/certbot-with-nomad:v2.11.0-v1.5.8"

        entrypoint = [
          "sh",
          "/usr/local/bin/renew.sh"
        ]

        extra_hosts = [
          "host.docker.internal:host-gateway"
        ]

        mount {
          type     = "bind"
          readonly = true

          source = "local/renew.sh"
          target = "/usr/local/bin/renew.sh"
        }

        mount {
          type = "bind"

          source = "local/renewal-hooks/"
          target = "/etc/letsencrypt/renewal-hooks/"
        }
      }

      template {
        data = <<EOH
#!/bin/sh

set -e

LIVE_PATH=/etc/letsencrypt/live
DEPLOY_HOOK_PATH=/etc/letsencrypt/renewal-hooks/deploy/update-nomad-var-certs

# Mark renewal hooks as executable
chmod +x "$DEPLOY_HOOK_PATH"

# Renew or issue certificate?
if [ ! -d "$LIVE_PATH/domain.tld" ]; then
  echo "Issuing new certificate..."

  certbot certonly \
    --non-interactive \
    --agree-tos \
    --email=contact@domain.tld \
    --preferred-challenges=http \
    --webroot -w /var/www/default/ \
    -d domain.tld \
    -d www.domain.tld

  # Manually trigger deploy hook, since it does not get triggered on \
  #   certificate creation
  echo "Manually triggering deploy hook..."

  sh "$DEPLOY_HOOK_PATH"
else
  echo "Checking if should renew existing certificate..."

  certbot renew \
    --no-random-sleep-on-renew \
    --deploy-hook="$DEPLOY_HOOK_PATH"
fi

echo "✅ Done."
        EOH

        destination = "local/renew.sh"
      }

      template {
        data = <<EOH
#!/bin/sh

set -e

echo "Updating certificates in Nomad variables..."

LIVE_PATH=/etc/letsencrypt/live

# Configure Nomad CLI
export NOMAD_ADDR="http://host.docker.internal:4646"
export NOMAD_TOKEN="{{ with nomadVar "nomad/jobs/job-certbot" }}{{ .NOMAD_TOKEN }}{{ end }}"

# Read certificates from disk
DOMAIN_TLD_KEY=$(cat "$LIVE_PATH/domain.tld/privkey.pem")
DOMAIN_TLD_CERTIFICATE=$(cat "$LIVE_PATH/domain.tld/fullchain.pem")

# Commit new certificates in Nomad
nomad var put -force certs/domain-tld \
  CERT_KEY="$DOMAIN_TLD_NAME_KEY" \
  CERT_CERTIFICATE="$DOMAIN_TLD_CERTIFICATE"

echo "Updated."
        EOH

        destination = "local/renewal-hooks/deploy/update-nomad-var-certs"
      }

      volume_mount {
        volume      = "certbot-data"
        destination = "/etc/letsencrypt/"
      }

      volume_mount {
        volume      = "web-data"
        destination = "/var/www/"
      }

      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

In the job file, make sure to replace and adapt the following values:

  • domain.tld and DOMAIN_TLD should be renamed to match your domain name (eg. mine would be valeriansaliou.name and VALERIANSALIOU_NAME)
  • Add all your sub-domains in the certbot certonly command arguments, with the -d option (one per sub-domain)
  • Replace contact@domain.tld with your email address — it is important that you are able to receive email on this address since Let’s Encrypt might send you security-related notifications

Once done, import the job:

nomad job run job-certbot.hcl

The job has been imported, but we are not yet ready to issue our first certificate. We will need to make some changes to our NGINX configuration to be able to pass the Let’s Encrypt HTTP verification challenge. However, NGINX will not start if we do not have yet any SSL key and certificate defined in our Nomad variables. It’s a chicken and egg problem…

To fix this issue, we will need to populate those Nomad variables with a self-signed certificate to start with, so that NGINX can start, and certbot can issue our real certificate (Let’s Encrypt needs to send some HTTP requests to our NGINX during the issuance process). Our users will never see this self-signed certificate in practice, we solely need it to start NGINX before the first issuance.

Generate a new SSL key and its self-signed certificate:

openssl ecparam -out key.pem -name secp256r1 -genkey
openssl req -new -key key.pem -x509 -nodes -days 3650 -out cert.pem

Now, create a file named certs-domain-tld:

path = "certs/domain-tld"

items {
  CERT_KEY = "-----BEGIN EC PRIVATE KEY-----\n[YOUR_PEM_HERE]\n-----END EC PRIVATE KEY-----"
  CERT_CERTIFICATE = "-----BEGIN CERTIFICATE-----\n[YOUR_PEM_HERE]\n-----END CERTIFICATE-----"
}

Make sure to replace domain-tld with your domain name, and insert the content of your key and self-signed certificate in [YOUR_PEM_HERE] and [YOUR_PEM_HERE]. Replace all new lines with \n.

Finally, import it to Nomad with:

nomad var put -in hcl @certs-domain-tld.nv.hcl

Certbot is now ready!

⚠️ Please do not attempt to manually start the certbot job just yet, we need to make some adjustments to our nginx job so that we can pass the Let’s Encrypt certificate issuance process.


3. Configuring NGINX to use your SSL certificate

Let’s load our SSL key and certificate from our NGINX job. We will be doing this by rendering our nginx.conf configuration using a Nomad template in our existing nginx job:

job "nginx" {
  type = "service"

  group "nginx" {
    task "nginx" {
      # (...) The rest of your job config goes here

      template {
        data = <<EOH
http {
    # (...) Your existing nginx.conf configuration goes here
    
    ssl_certificate      certs/domain.tld.crt;
    ssl_certificate_key  certs/domain.tld.key;        
}
                EOH

        destination = "local/nginx/nginx.conf"

        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      template {
        data = <<EOH
{{ with nomadVar "certs/domain-tld" }}{{ .CERT_KEY }}{{ end }}
        EOH

        destination = "local/nginx/certs/domain.tld.key"

        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      template {
        data = <<EOH
{{ with nomadVar "certs/domain-tld" }}{{ .CERT_CERTIFICATE }}{{ end }}
        EOH

        destination = "local/nginx/certs/domain.tld.crt"

        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      # (...) The rest of your job config goes here
    }
  }
}

Make sure to replace domain.tld and domain-tld with your domain name, so that it fits the name of the Nomad variable that we defined earlier.

We will also need to allow our nginx job to read the variables under the certs/ path, by deploying a Nomad policy.

Create a policy file named certs-user.policy.hcl:

namespace "default" {
  variables {
    path "certs/*" {
      capabilities = ["read"]
    }
  }
}

And then bind this policy to your nginx job:

nomad acl policy apply -namespace default -job nginx certs-user-nginx ./certs-user.policy.hcl

Finally, we need to serve the ACME challenge files that the certbot job will create on our data-web volume, right from our nginx job.

Add the following to your nginx job configuration:

job "nginx" {
  type = "service"

  group "nginx" {
    volume "nginx-web-data" {
      type   = "host"
      source = "data-web"

      read_only = true
    }
    
    task "nginx" {
      # (...) The rest of your job config goes here

      template {
        data = <<EOH
location = /.well-known/acme-challenge/ {
    return 404;
}

location ^~ /.well-known/acme-challenge/ {
    default_type "text/plain";

    root /var/www/default;

    break;
}
        EOH

        destination = "local/nginx/acme_challenge.conf"
      }

      volume_mount {
        volume      = "nginx-web-data"
        destination = "/var/www/"

        read_only = true
      }

      # (...) The rest of your job config goes here
    }
  }
}

Now, you can simply include the acme_challenge.conf file in each server block (ie. each Virtual Host) that should receive a SSL certificate:

server {
    listen [::]:443 ssl;
    server_name www.domain.tld;

    # (...) Your server block configuration goes here

    include acme_challenge.conf;
}

Don’t forget to submit your updated nginx job HCL definition to Nomad before we continue.


4. Issuing your first SSL certificate

We’re now ready to issue our first SSL certificate. We will be triggering a manual run of the certbot job, which will then run the certbot certonly command on its first run (since its database is empty, it will issue the first certificate). Later runs of the job will use the certbot renew command (since the database will have pre-existing certificates by then).

Go to your Nomad Web UI using an administrator token, and click on the “Force Launch” button:

Hold on for a few seconds and you should see an allocation for your certbot job appear. Click on the logs viewer, and monitor the progress of your Let’s Encrypt certificate issuance.

When the certbot job run is complete, navigate to your nginx job in Nomad’s Web UI, and confirm that a template re-rendering event gets triggered. Once the template re-renders, your NGINX server should get either restarted or reloaded, depending on the reload signal you opted for.

You may now open your Web browser and navigate to your website URL. You should see a valid SSL certificate now.


5. Testing auto-renewal of your SSL certificate

You may force launch your certbot job once again to test the renewal process. Certificates that are not yet due for renewal will not be renewed, so no attempt will be made at this point.

To make sure that renewal works for the first time, I recommend that you check the expiration date of the first SSL certificate that was issued to you, take note of it, and come back in the future a few days before its expiration to check it again. The certificate should by then have a new expiration date, at a minimum of 1 month in the future.

If the expiration date is still the one from your first certificate, then something is wrong. If this happens, I recommend triggering a manual run of your certbot job and observing the logs. Do you see any error during the renewal?

If nothing goes wrong with the Certbot renewal, I would look for the nginx job to double-check if it’s not a simple reloading issue. To confirm that, I’d simply manually stop and then start the nginx job and check the certificate that’s being used again. If the stop and start operation fixed your issue, then it means that something is wrong with the template re-rendering process in your nginx job, and not the certificate renewal process in the certbot job.

We are all done at this point! Enjoy your Nomad-native Certbot setup 😃


6. Additional notes

Accessing Nomad API from a job

You may have noticed that the certbot job uses the host.docker.internal host to access Nomad’s HTTP API. This is the cleanest way to expose Nomad’s HTTP API to jobs running in network-isolated Docker containers, which fits my single Nomad client/server setup.

Since your Nomad deployment might be different than mine, make sure to adjust the host.docker.internal hostname to anything that fits your setup, which could either be:

  • Multiple Nomad servers/clients: an IP address pointing to one of your Nomad servers on your VLAN
  • Single Nomad server/client + Docker host networking: an IP address such as ::1 pointing to your Nomad server over localhost
  • Single Nomad server/client + Docker bridge networking: the host.docker.internal hostname pointing to your Nomad server over the Docker bridge network (in that case, make sure that your Nomad HTTP server listens on 172.17.0.1)

Certbot image Dockerfile

If you’re looking to build the Certbot Docker image yourself (valeriansaliou/certbot-with-nomad on Docker Hub), here’s the source Dockerfile:

# ------------------ #
# CERTBOT WITH NOMAD #
# ------------------ #

ARG CERTBOT_VERSION=2.11.0

FROM certbot/certbot:v${CERTBOT_VERSION}

ARG NOMAD_VERSION=1.5.8-r4

RUN apk add --no-cache nomad=${NOMAD_VERSION}

ENTRYPOINT ["certbot"]

Keep the Certbot Docker image cached locally

After running the certbot job for the first time, Nomad’s garbage collector will run and evict your local valeriansaliou/certbot-with-nomad Docker image after a few minutes/hours.

Since we will be running the same certbot batch job over and over again every 3 days, it is pointless to let Nomad’s GC remove your local valeriansaliou/certbot-with-nomad.

Let’s save some network gigabytes every month and a lot of SSD writes by adjusting Nomad docker driver configuration in your nomad.hcl to increase the GC cleanup delay:

plugin "docker" {
  config {
    # Increase Docker images cleanup timer to 10 days, so that weekly periodic \
    #   jobs get a better chance of re-using a cached image upon its next run
    gc {
      image       = true
      image_delay = "240h"
    }
  }
}
I unfortunately did not find any no other way to avoid the GC evicting our Certbot Docker image than changing the global GC settings. There’s apparently no way to adjust its settings for a particular Nomad job or Docker image.

🇦🇷 Written from El Chalten, Argentina.