lecertvend - ACME client with HashiCorp Vault storage backend

Posted Feb 4, 2023

ACME certificate issuers have drastically lowered the barriers to having browser-trusted SSL certificates on HTTPS sites and services. However, there are still challenges in managing ACME issued certs on internal-only HTTPS servers. I built the lecertvend tool to separate certificate issuance/renewal from emplacement. I used HashiCorp Vault to tie the two workflows together. The result is an easy and centralized process for acquiring and renewing certificates for both internal-only and external web servers.

Most of the tooling I have found for ACME certificate management either uses the local filesystem for certificate storage (Certbot, acme.sh) or is a proxy server that is expects to be doing SSL termination (Caddy, Traefik). Certificates issued with acme.sh and Certbot can be copied into Vault, but those tools still rely on the certificates existing on a filesystem in order to manage renewal.

Desired Features

The solution I wanted for ACME certificate management needed to have the following features:

  • Store all certs, keys, and configuration in HashiCorp Vault. This way the tool could be run anywhere with Vault access (cron, CI jobs, ad-hoc) and operate on a consistent database of certificates.
  • Solve DNS-01 challenges automatically using a Cloudflare token in Vault.
  • Allow multiple tenants to leverage the same Vault by protecting certificate / Cloudflare secrets from each other with differing secret prefixes and Vault policy.
  • Issue or renew a single certificate, or renew all certificates present at a specified path in Vault.
  • Use a consistent ACME account for all certificate operations for a particular tenant. Store the account IDs and keys in Vault so they do not have to be distributed separately.

I was not able to find an existing solution that used Vault as a primary storage target. CertMagic has a modular storage interface but that is only for certs and keys, and I wanted all necessary data for certificate issuance and challenge solving in Vault. I did find some things in CertMagic to be helpful examples, so I am glad I dug into that option a little ways.

Ultimately I spent a couple days hammering out a custom tool, which is lecertvend.

GitHub project here: https://github.com/arcandspark/lecertvend

The lecertvend tool is a CLI program that meets the feature list above. It gets used in two main places:

Application Deploy Pipelines

When an application build/deploy pipeline runs, the Project’s CI job will have a Vault token that has policy assigned based on the GitLab Group that the Project belongs to. GitLab is acquiring a Vault token with a JWT that indicates the group ID, and Vault returns a token with permissions based on that group’s Vault policy. That policy allows access to a secrets path where that GitLab group’s ACME issued certs are stored, along with a Cloudflare token that allows DNS updates to domains appropriate for that group.

In short, when a CI job runs, it will have access to issue and renew certs for a particular GitLab group, but not others. This is how multi-tenancy support is achieved.

From a developer experience perspective, this means lecertvend can be used as a one-liner in a CI job to ensure that a certificate will be issued or already exists for their project:

variables:
  PROJECT_SLUG: myapp
.....

infra:
  stage: infra
  script:
    - |
      ... terraform apply, other infra scripting ...
      lecertvend -vend -mount secret -prefix lecertvend/teamname/teamdomain.com -secret ${PROJECT_SLUG} -names ${PROJECT_SLUG}      
.....

And that certificate can be referenced in a Nomad Job that gets deployed by that pipeline:

job "myapp" {
  type        = "service"
.....
  group "service" {
    count = 1
.....
    task "myapp-service" {
      driver = "docker"
.....
      vault {
        policies = ["nomad-job-teamname"]
        env      = true
      }

      template {
        change_mode          = "restart"
        error_on_missing_key = true
        uid                  = 0
        gid                  = 0
        perms                = "600"
        destination          = "secrets/cert.pem"
        data                 = <<-EOF
          {{with secret "secret/data/lecertvend/teamname/teamdomain.com/${var.project_slug}"}}{{.Data.data.cert}}{{end}}
        EOF
      }

      template {
        change_mode          = "restart"
        error_on_missing_key = true
        uid                  = 0
        gid                  = 0
        perms                = "600"
        destination          = "secrets/key.pem"
        data                 = <<-EOF
          {{with secret "secret/data/lecertvend/teamname/teamdomain.com/${var.project_slug}"}}{{.Data.data.key}}{{end}}
        EOF
      }
    }
  }
}

Certificate Renewal Pipeline

Each GitLab Group also has a certificate renewal project, with a simple pipeline to call lecertvend daily to renew any certificates that are nearing expiration:

stages:
  - renew

renew:
  stage: renew
  script:
    - lecertvend -renew -mindays 28 -mount secret -prefix lecertvend/teamname

The output of which looks like this:

$ lecertvend -renew -mindays 28 -mount secret -prefix lecertvend/teamname
Renewing certs in prefix lecertmgmt/omt if less than 28 days validity remain.
lecertvend/teamname does not end in zone, looking for zones within...
ignoring secret lecertvend in non-zone prefix lecertvend/teamname
Renewing certs in prefix lecertvend/teamname/teamdomain.com if less than 28 days validity remain.
cert in secret lecertvend/teamname/teamdomain.com/pots has 42 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/desk has 69 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/door has 63 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/comb has 60 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/table has 63 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/chair has 76 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/music has 76 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/keys has 45 days of validity remaining, taking no action.
cert in secret lecertvend/teamname/teamdomain.com/www has 76 days of validity remaining, taking no action.
renewals started, waiting for completion...
renewals complete.
Cleaning up project directory and file based variables 00:00
Job succeeded

Architecture Diagram

The following is an overall visualization of how lecertvend is used to issue and renew certificates in my environment: