cert-manager
letsencrypt
invalid certificates
ssl
troubleshooting

cert-manager letsencrypt issuing invalid certs

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When cert-manager issues an invalid certificate, the problem is usually not that Let's Encrypt is random or broken. It is usually a mismatch between the requested hostname, the ACME challenge path, DNS, ingress routing, or the issuer configuration used during validation.

Start by identifying what "invalid" means

The word invalid can mean several different failure modes:

  • the certificate is self-signed instead of from Let's Encrypt
  • the certificate is for the wrong hostname
  • the certificate chain is incomplete
  • the certificate has already expired
  • the browser sees a challenge or issuer failure and serves a fallback certificate

Before changing manifests, inspect the actual certificate presented by the service:

bash
openssl s_client -connect example.com:443 -servername example.com </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer -dates -ext subjectAltName

That tells you whether the problem is with the issued certificate itself or with which certificate your ingress is serving.

Check cert-manager resources in order

cert-manager creates several resources during issuance. Reading them in order makes debugging faster:

  1. Certificate
  2. CertificateRequest
  3. Order
  4. Challenge
  5. ingress or DNS objects used for validation

Useful commands:

bash
kubectl get certificate,certificaterequest,order,challenge -A
kubectl describe certificate my-cert -n web
kubectl describe challenge -n web

The describe output often contains the exact reason the ACME flow failed.

A correct HTTP-01 issuer example

A common configuration is an HTTP-01 challenge solved through an ingress controller:

yaml
1apiVersion: cert-manager.io/v1
2kind: ClusterIssuer
3metadata:
4  name: letsencrypt-prod
5spec:
6  acme:
7    email: [email protected]
8    server: https://acme-v02.api.letsencrypt.org/directory
9    privateKeySecretRef:
10      name: letsencrypt-prod-account-key
11    solvers:
12      - http01:
13          ingress:
14            class: nginx
15---
16apiVersion: cert-manager.io/v1
17kind: Certificate
18metadata:
19  name: example-com
20  namespace: web
21spec:
22  secretName: example-com-tls
23  issuerRef:
24    name: letsencrypt-prod
25    kind: ClusterIssuer
26  dnsNames:
27    - example.com
28    - www.example.com

If the ingress class does not match the real controller handling traffic, the ACME challenge may never be reachable even though the YAML looks correct.

DNS and ingress are the most common root causes

For HTTP-01:

  • the domain must resolve to the ingress controller handling the challenge
  • the /.well-known/acme-challenge/ path must be reachable from the public internet
  • no redirect or custom middleware should break that path

For DNS-01:

  • the DNS provider credentials must be correct
  • TXT records must be created in the correct zone
  • propagation must complete before validation

Quick checks:

bash
dig +short example.com
curl -I http://example.com/.well-known/acme-challenge/test

If DNS points somewhere else, cert-manager can complete internal steps while Let's Encrypt still validates against the wrong endpoint.

Watch for staging versus production confusion

Let's Encrypt has separate staging and production endpoints. Staging certificates are intentionally untrusted by browsers.

A staging issuer usually looks like:

yaml
server: https://acme-staging-v02.api.letsencrypt.org/directory

If you accidentally use the staging endpoint in production, cert-manager may issue a technically valid certificate that browsers still reject. This is one of the easiest mistakes to miss when manifests are copied between environments.

Secret and ingress mismatches

Sometimes cert-manager successfully issues the right certificate, but the ingress still serves the wrong secret. Confirm that:

  • the Certificate.spec.secretName matches the secret used by the ingress
  • the ingress is in the same namespace as the secret
  • no older ingress object references another TLS secret

Example ingress TLS section:

yaml
1tls:
2  - hosts:
3      - example.com
4    secretName: example-com-tls

If that secret name differs from the one cert-manager updates, clients will continue seeing an old or unrelated certificate.

Common Pitfalls

The most common mistake is debugging cert-manager first when the real problem is DNS pointing to the wrong load balancer. Another frequent issue is using the staging ACME server and then treating the resulting untrusted certificate as a signing failure. Teams also forget that ingress class names must match the actual controller, especially after cluster upgrades or chart changes. Secret name mismatches are another source of confusion because cert-manager can succeed while ingress still serves an old secret. Finally, people often test only inside the cluster, while ACME validation happens from the public internet and may see very different routing.

Summary

  • Define "invalid" first by inspecting the certificate actually served to clients.
  • Debug Certificate, CertificateRequest, Order, and Challenge resources in sequence.
  • Verify DNS, ingress routing, and challenge reachability from the public internet.
  • Make sure you are using the production ACME endpoint when you want browser-trusted certificates.
  • Confirm that ingress references the same TLS secret cert-manager updates.
  • Treat DNS and secret wiring issues as first-class suspects, not afterthoughts.

Course illustration
Course illustration

All Rights Reserved.