Hosting applications in Nomad is easy, all it takes is writing a small HCL file and all your apps are now magically running on your Nomad cluster with Infrastructure as Code. Hosting your HTTP load balancer on Nomad, acting as a reverse proxy to your HTTP services, is another story: if I want to run my server over HTTPS, how do I generate SSL certificates within Nomad?
To address this need, you could take the easy path: routing ingress traffic to your HTTP load balancer running in Nomad with Cloudflare Tunnels. Cloudflare would handle SSL termination for you, so that you wouldn’t have to deal with SSL certificates.
However, not all people want to rely on Cloudflare to handle this task. The alternative would be to handle SSL termination on your own NGINX server, and acquire the SSL certificates with Certbot (via Let’s Encrypt). In a traditional Linux system setup, whenever your SSL certificate would get renewed, Certbot would signal NGINX to reload itself. In a Nomad cluster, that’d play out differently.
I have came up with an elegant method to manage SSL certificates in a Nomad-native way with Certbot, leveraging Nomad templates, which I’ll explain in details in this article. This method makes it possible to provide SSL certificates to any service running in Nomad, from a NGINX load balancer to a Postfix mail server. It also lets you reload Nomad jobs using your SSL certificates whenever they get renewed.
1. The big picture
On a traditional Linux system, you’d issue certificates using the certbot
command, and then use a cron job in order to trigger renewal checks. In a Nomad environment, we will still use the same certbot
command, although it will be packaged in a Docker image. Renewals will be handled by a Nomad batch
job — an alias name for a Cronjob, which will be scheduled to run periodically (for example, every week). Nomad will handle the responsibility of firing the job whenever necessary.
Your SSL key and certificate PEMs will be stored in Nomad variables, which can then be used in whichever job uses SSL certificates, such as your NGINX load balancer job. Nomad variables come with the benefit of being able to trigger templates re-rendering whenever a variable change, meaning that Nomad will know when to restart your NGINX server so that it loads your new SSL certificate. Instead of restarting NGINX, we might rather want to reload it with a SIGHUP
signal, which will make certificate updates a zero-downtime operation.
Finally, Nomad variables can be updated via Nomad’s HTTP API, which is authenticated with a private token that we will generate. We are looking to use Nomad’s HTTP API to submit the new SSL key and certificate whenever it changes, that is, whenever it gets renewed by our certbot
job. We will be using Certbot renewal hooks to execute a script whenever our Nomad variables need to be updated with the new SSL key and certificates.
2. Defining the Certbot cron job
Let’s start by creating our certbot
periodic job!
First, we need to create 2 persistent volumes:
- One where Certbot will store its database (we will call it
data-certs
) - Another where Certbot will store the temporary domain verification files served over HTTP, in the
.well-known/
directory (we will call itdata-web
).
Note that since my Nomad setup is 1 server + 1 client, I am using host volumes. If your Nomad setup is distributed in a cluster, you would rather be looking to create volumes using CSI, which will be network volumes such as a Block Storage provided by your cloud provider. This aspect is not covered in this article, you may adapt the configurations provided in this article to match your Nomad topology.
If you need to create those two volumes on your Nomad host, edit your Nomad client agent nomad.hcl
configuration and add eg. the following:
client {
host_volume "data-web" {
path = "/srv/nomad/volumes/web"
}
host_volume "data-certs" {
path = "/srv/nomad/volumes/certs"
}
}
Before we continue, we will also need to create a policy and token to be able to hit the Nomad API when our SSL key and certificate gets renewed.
Create a policy file named certs-updater.policy.hcl
:
namespace "default" {
variables {
path "certs/*" {
capabilities = ["write"]
}
}
}
Now, import this policy:
nomad acl policy apply certs-updater ./certs-updater.policy.hcl
And create a Nomad token:
nomad acl token create -name "LetsEncrypt (job-certbot)" -policy certs-updater
Take note of the generated token, and insert it in a file named job-certbot.nv.hcl
:
path = "nomad/jobs/job-certbot"
items {
NOMAD_TOKEN = "YOUR_CERTS_UPDATER_TOKEN_HERE"
}
And import it to Nomad:
nomad var put -in hcl @job-certbot.nv.hcl
Good! Now we can define the actual certbot
job.
Create a job file named job-certbot.hcl
:
job "job-certbot" {
type = "batch"
periodic {
crons = ["0 12 */3 * *"]
time_zone = "Europe/Paris"
prohibit_overlap = true
}
reschedule {
attempts = 0
}
group "job-certbot" {
restart {
attempts = 0
mode = "fail"
}
volume "certbot-data" {
type = "host"
source = "data-certs"
}
volume "web-data" {
type = "host"
source = "data-web"
}
task "job-certbot" {
driver = "docker"
config {
image = "valeriansaliou/certbot-with-nomad:v2.11.0-v1.5.8"
entrypoint = [
"sh",
"/usr/local/bin/renew.sh"
]
extra_hosts = [
"host.docker.internal:host-gateway"
]
mount {
type = "bind"
readonly = true
source = "local/renew.sh"
target = "/usr/local/bin/renew.sh"
}
mount {
type = "bind"
source = "local/renewal-hooks/"
target = "/etc/letsencrypt/renewal-hooks/"
}
}
template {
data = <<EOH
#!/bin/sh
set -e
LIVE_PATH=/etc/letsencrypt/live
DEPLOY_HOOK_PATH=/etc/letsencrypt/renewal-hooks/deploy/update-nomad-var-certs
# Mark renewal hooks as executable
chmod +x "$DEPLOY_HOOK_PATH"
# Renew or issue certificate?
if [ ! -d "$LIVE_PATH/domain.tld" ]; then
echo "Issuing new certificate..."
certbot certonly \
--non-interactive \
--agree-tos \
--email=contact@domain.tld \
--preferred-challenges=http \
--webroot -w /var/www/default/ \
-d domain.tld \
-d www.domain.tld
# Manually trigger deploy hook, since it does not get triggered on \
# certificate creation
echo "Manually triggering deploy hook..."
sh "$DEPLOY_HOOK_PATH"
else
echo "Checking if should renew existing certificate..."
certbot renew \
--no-random-sleep-on-renew \
--deploy-hook="$DEPLOY_HOOK_PATH"
fi
echo "✅ Done."
EOH
destination = "local/renew.sh"
}
template {
data = <<EOH
#!/bin/sh
set -e
echo "Updating certificates in Nomad variables..."
LIVE_PATH=/etc/letsencrypt/live
# Configure Nomad CLI
export NOMAD_ADDR="http://host.docker.internal:4646"
export NOMAD_TOKEN="{{ with nomadVar "nomad/jobs/job-certbot" }}{{ .NOMAD_TOKEN }}{{ end }}"
# Read certificates from disk
DOMAIN_TLD_KEY=$(cat "$LIVE_PATH/domain.tld/privkey.pem")
DOMAIN_TLD_CERTIFICATE=$(cat "$LIVE_PATH/domain.tld/fullchain.pem")
# Commit new certificates in Nomad
nomad var put -force certs/domain-tld \
CERT_KEY="$DOMAIN_TLD_NAME_KEY" \
CERT_CERTIFICATE="$DOMAIN_TLD_CERTIFICATE"
echo "Updated."
EOH
destination = "local/renewal-hooks/deploy/update-nomad-var-certs"
}
volume_mount {
volume = "certbot-data"
destination = "/etc/letsencrypt/"
}
volume_mount {
volume = "web-data"
destination = "/var/www/"
}
resources {
cpu = 100
memory = 64
}
}
}
}
In the job file, make sure to replace and adapt the following values:
domain.tld
andDOMAIN_TLD
should be renamed to match your domain name (eg. mine would bevaleriansaliou.name
andVALERIANSALIOU_NAME
)- Add all your sub-domains in the
certbot certonly
command arguments, with the-d
option (one per sub-domain) - Replace
contact@domain.tld
with your email address — it is important that you are able to receive email on this address since Let’s Encrypt might send you security-related notifications
Once done, import the job:
nomad job run job-certbot.hcl
The job has been imported, but we are not yet ready to issue our first certificate. We will need to make some changes to our NGINX configuration to be able to pass the Let’s Encrypt HTTP verification challenge. However, NGINX will not start if we do not have yet any SSL key and certificate defined in our Nomad variables. It’s a chicken and egg problem…
To fix this issue, we will need to populate those Nomad variables with a self-signed certificate to start with, so that NGINX can start, and certbot
can issue our real certificate (Let’s Encrypt needs to send some HTTP requests to our NGINX during the issuance process). Our users will never see this self-signed certificate in practice, we solely need it to start NGINX before the first issuance.
Generate a new SSL key and its self-signed certificate:
openssl ecparam -out key.pem -name secp256r1 -genkey
openssl req -new -key key.pem -x509 -nodes -days 3650 -out cert.pem
Now, create a file named certs-domain-tld
:
path = "certs/domain-tld"
items {
CERT_KEY = "-----BEGIN EC PRIVATE KEY-----\n[YOUR_PEM_HERE]\n-----END EC PRIVATE KEY-----"
CERT_CERTIFICATE = "-----BEGIN CERTIFICATE-----\n[YOUR_PEM_HERE]\n-----END CERTIFICATE-----"
}
Make sure to replace domain-tld
with your domain name, and insert the content of your key and self-signed certificate in [YOUR_PEM_HERE]
and [YOUR_PEM_HERE]
. Replace all new lines with \n
.
Finally, import it to Nomad with:
nomad var put -in hcl @certs-domain-tld.nv.hcl
Certbot is now ready!
⚠️ Please do not attempt to manually start the certbot
job just yet, we need to make some adjustments to our nginx
job so that we can pass the Let’s Encrypt certificate issuance process.
3. Configuring NGINX to use your SSL certificate
Let’s load our SSL key and certificate from our NGINX job. We will be doing this by rendering our nginx.conf
configuration using a Nomad template in our existing nginx
job:
job "nginx" {
type = "service"
group "nginx" {
task "nginx" {
# (...) The rest of your job config goes here
template {
data = <<EOH
http {
# (...) Your existing nginx.conf configuration goes here
ssl_certificate certs/domain.tld.crt;
ssl_certificate_key certs/domain.tld.key;
}
EOH
destination = "local/nginx/nginx.conf"
change_mode = "signal"
change_signal = "SIGHUP"
}
template {
data = <<EOH
{{ with nomadVar "certs/domain-tld" }}{{ .CERT_KEY }}{{ end }}
EOH
destination = "local/nginx/certs/domain.tld.key"
change_mode = "signal"
change_signal = "SIGHUP"
}
template {
data = <<EOH
{{ with nomadVar "certs/domain-tld" }}{{ .CERT_CERTIFICATE }}{{ end }}
EOH
destination = "local/nginx/certs/domain.tld.crt"
change_mode = "signal"
change_signal = "SIGHUP"
}
# (...) The rest of your job config goes here
}
}
}
Make sure to replace domain.tld
and domain-tld
with your domain name, so that it fits the name of the Nomad variable that we defined earlier.
We will also need to allow our nginx
job to read the variables under the certs/
path, by deploying a Nomad policy.
Create a policy file named certs-user.policy.hcl
:
namespace "default" {
variables {
path "certs/*" {
capabilities = ["read"]
}
}
}
And then bind this policy to your nginx
job:
nomad acl policy apply -namespace default -job nginx certs-user-nginx ./certs-user.policy.hcl
Finally, we need to serve the ACME challenge files that the certbot
job will create on our data-web
volume, right from our nginx
job.
Add the following to your nginx
job configuration:
job "nginx" {
type = "service"
group "nginx" {
volume "nginx-web-data" {
type = "host"
source = "data-web"
read_only = true
}
task "nginx" {
# (...) The rest of your job config goes here
template {
data = <<EOH
location = /.well-known/acme-challenge/ {
return 404;
}
location ^~ /.well-known/acme-challenge/ {
default_type "text/plain";
root /var/www/default;
break;
}
EOH
destination = "local/nginx/acme_challenge.conf"
}
volume_mount {
volume = "nginx-web-data"
destination = "/var/www/"
read_only = true
}
# (...) The rest of your job config goes here
}
}
}
Now, you can simply include the acme_challenge.conf
file in each server
block (ie. each Virtual Host) that should receive a SSL certificate:
server {
listen [::]:443 ssl;
server_name www.domain.tld;
# (...) Your server block configuration goes here
include acme_challenge.conf;
}
Don’t forget to submit your updated nginx
job HCL definition to Nomad before we continue.
4. Issuing your first SSL certificate
We’re now ready to issue our first SSL certificate. We will be triggering a manual run of the certbot
job, which will then run the certbot certonly
command on its first run (since its database is empty, it will issue the first certificate). Later runs of the job will use the certbot renew
command (since the database will have pre-existing certificates by then).
Go to your Nomad Web UI using an administrator token, and click on the “Force Launch” button:
Hold on for a few seconds and you should see an allocation for your certbot
job appear. Click on the logs viewer, and monitor the progress of your Let’s Encrypt certificate issuance.
When the certbot
job run is complete, navigate to your nginx
job in Nomad’s Web UI, and confirm that a template re-rendering event gets triggered. Once the template re-renders, your NGINX server should get either restarted or reloaded, depending on the reload signal you opted for.
You may now open your Web browser and navigate to your website URL. You should see a valid SSL certificate now.
5. Testing auto-renewal of your SSL certificate
You may force launch your certbot
job once again to test the renewal process. Certificates that are not yet due for renewal will not be renewed, so no attempt will be made at this point.
To make sure that renewal works for the first time, I recommend that you check the expiration date of the first SSL certificate that was issued to you, take note of it, and come back in the future a few days before its expiration to check it again. The certificate should by then have a new expiration date, at a minimum of 1 month in the future.
If the expiration date is still the one from your first certificate, then something is wrong. If this happens, I recommend triggering a manual run of your certbot
job and observing the logs. Do you see any error during the renewal?
If nothing goes wrong with the Certbot renewal, I would look for the nginx
job to double-check if it’s not a simple reloading issue. To confirm that, I’d simply manually stop and then start the nginx
job and check the certificate that’s being used again. If the stop and start operation fixed your issue, then it means that something is wrong with the template re-rendering process in your nginx
job, and not the certificate renewal process in the certbot
job.
We are all done at this point! Enjoy your Nomad-native Certbot setup 😃
6. Additional notes
Accessing Nomad API from a job
You may have noticed that the certbot
job uses the host.docker.internal
host to access Nomad’s HTTP API. This is the cleanest way to expose Nomad’s HTTP API to jobs running in network-isolated Docker containers, which fits my single Nomad client/server setup.
Since your Nomad deployment might be different than mine, make sure to adjust the host.docker.internal
hostname to anything that fits your setup, which could either be:
- Multiple Nomad servers/clients: an IP address pointing to one of your Nomad servers on your VLAN
- Single Nomad server/client + Docker host networking: an IP address such as
::1
pointing to your Nomad server overlocalhost
- Single Nomad server/client + Docker bridge networking: the
host.docker.internal
hostname pointing to your Nomad server over the Docker bridge network (in that case, make sure that your Nomad HTTP server listens on172.17.0.1
)
Certbot image Dockerfile
If you’re looking to build the Certbot Docker image yourself (valeriansaliou/certbot-with-nomad
on Docker Hub), here’s the source Dockerfile:
# ------------------ #
# CERTBOT WITH NOMAD #
# ------------------ #
ARG CERTBOT_VERSION=2.11.0
FROM certbot/certbot:v${CERTBOT_VERSION}
ARG NOMAD_VERSION=1.5.8-r4
RUN apk add --no-cache nomad=${NOMAD_VERSION}
ENTRYPOINT ["certbot"]
Keep the Certbot Docker image cached locally
After running the certbot
job for the first time, Nomad’s garbage collector will run and evict your local valeriansaliou/certbot-with-nomad
Docker image after a few minutes/hours.
Since we will be running the same certbot
batch job over and over again every 3 days, it is pointless to let Nomad’s GC remove your local valeriansaliou/certbot-with-nomad
.
Let’s save some network gigabytes every month and a lot of SSD writes by adjusting Nomad docker
driver configuration in your nomad.hcl
to increase the GC cleanup delay:
plugin "docker" {
config {
# Increase Docker images cleanup timer to 10 days, so that weekly periodic \
# jobs get a better chance of re-using a cached image upon its next run
gc {
image = true
image_delay = "240h"
}
}
}
I unfortunately did not find any no other way to avoid the GC evicting our Certbot Docker image than changing the global GC settings. There’s apparently no way to adjust its settings for a particular Nomad job or Docker image.
🇦🇷 Written from El Chalten, Argentina.