Webhook Alerting Patterns for SSL Expiration and TLS Failures

Webhooks are the simplest way to turn "a JSON check" into "someone actually gets notified before prod burns."

But naive webhooks spam teams, flap during partial outages, and don't carry enough context to debug quickly.

Here are webhook patterns that work well for SSL expiration and TLS failures — especially when you're checking from multiple regions.

Pattern 1: Threshold alerts (30/14/7 days)

For expiry, use tiers:

30 days: planning
14 days: action
7 days: urgent

Trigger on:

daysUntilExpiry <= threshold

Avoid spamming:

fire once per threshold crossing
don't re-fire daily unless you want reminders

Implementation:

store last-fired threshold per host
only emit when crossing from >14 to <=14, etc.

Pattern 2: Quorum-gated alerts (reduce noise)

When you check from 3 regions, alerting should follow the same logic.

Good default:

Hard alert if 2/3 regions fail (quorum not met)
Warning if 1/3 fails (partial rollout, edge issue)

Why this matters:

One region can be flaky, or one edge is mid-deploy.
Quorum prevents your alert channel from turning into noise.

Pattern 3: Dedupe + cooldown (stop flapping storms)

TLS endpoints can flap during:

CDN deployments
load balancer changes
intermittent network issues

Add a cooldown window:

don't send the same alert more than once per X minutes
keep emitting state changes (RECOVERED) immediately

Suggested values:

10–15 minutes cooldown for repeating failures
0 cooldown for recovered event

Pattern 4: State machine events (OPEN / UPDATE / RESOLVED)

Instead of sending "ALERT ALERT" repeatedly, model incidents:

Event types:

incident.opened
incident.updated (optional)
incident.resolved

Rules:

OPEN when crossing into failing state (quorum broken or expiry threshold crossed)
RESOLVE when returning to healthy (quorum met or threshold no longer breached)

Benefits:

clean timelines
supports Slack threads / Opsgenie incidents / PagerDuty dedupe keys

Pattern 5: Stable dedupe keys (so tools can group)

Always include:

dedupeKey (stable across retries)
host
checkType (tls)
reasonCode (TLS_EXPIRED, TLS_EXPIRY_WITHIN_14_DAYS, TLS_HANDSHAKE_FAILED, etc.)

Example dedupe key:

tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS

For region-specific warnings:

tls:example.com:TLS_HANDSHAKE_FAILED:apac

Pattern 6: Payload that is actually actionable

Minimum payload fields:

summary status + quorum
daysUntilExpiry + expiresAt
issuer + subject
per-region statuses (and error message if any)
"what changed" diff (if issuer/chain changed)

Example payload shape (conceptual):

{
  "event": "incident.opened",
  "dedupeKey": "tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS",
  "host": "example.com",
  "status": "Degraded",
  "quorum": { "required": 2, "total": 3, "met": true },
  "reason": { "code": "TLS_EXPIRY_WITHIN_14_DAYS", "severity": "warning" },
  "tls": {
    "expiresAt": "2026-01-25T00:00:00Z",
    "daysUntilExpiry": 12,
    "issuer": "R3",
    "protocol": "TLS1.3"
  },
  "regions": [
    { "region": "us", "status": "Healthy" },
    { "region": "eu", "status": "Healthy" },
    { "region": "apac", "status": "Healthy" }
  ],
  "links": { "detailsUrl": "https://netdiag.dev/runs/abc123" }
}

Keep it small; the goal is "triage in 10 seconds."

Pattern 7: Escalation ladder (don't page people for 30-day warnings)

Suggested routing:

30-day expiry → email / low-noise channel
14-day expiry → #ops channel + ticket
7-day expiry → paging (if you're serious)
handshake failure (quorum broken) → paging

This prevents alert fatigue.

Pattern 8: Recovery message (teams love closure)

Always send a RESOLVED message:

"TLS check recovered (quorum met)."
"Renewal deployed; expiry now 89 days."

It builds trust in alerts and helps postmortems.

Pattern 9: Include evidence only when it helps

Evidence is useful for:

region mismatch
chain change
redirect chain to a different hostname

Evidence is noise for:

steady-state healthy checks

So: collect evidence in auto mode (only when something is off).

Common webhook mistakes (avoid these)

Sending daily "still expiring" spam
No dedupe key (alerts never group)
No region breakdown (can't debug)
No "resolved" event (no closure)
Alerting on 1-region blips as critical

What else to monitor beyond expiry

If you're setting up TLS monitoring, expiry is just the starting point. For a comprehensive checklist of what else can break, see What to Monitor Besides SSL Certificate Expiry.

What to Monitor Besides SSL Certificate Expiry — issuer, chain, SANs, protocol, and regional mismatches
Certificate Chain Changes: Why Renewals Sometimes Break Clients — what to detect, why some clients fail, and how to alert safely