ssl webhooks alerting

Webhook Alerting Patterns for SSL Expiration and TLS Failures

Practical webhook patterns for SSL/TLS monitoring: dedupe, thresholds, quorum, escalation, and payload design.

4 min read

Webhooks are the simplest way to turn "a JSON check" into "someone actually gets notified before prod burns."

But naive webhooks spam teams, flap during partial outages, and don't carry enough context to debug quickly.

Here are webhook patterns that work well for SSL expiration and TLS failures — especially when you're checking from multiple regions.


Pattern 1: Threshold alerts (30/14/7 days)

For expiry, use tiers:

  • 30 days: planning
  • 14 days: action
  • 7 days: urgent

Trigger on:

  • daysUntilExpiry <= threshold

Avoid spamming:

  • fire once per threshold crossing
  • don't re-fire daily unless you want reminders

Implementation:

  • store last-fired threshold per host
  • only emit when crossing from >14 to <=14, etc.

Pattern 2: Quorum-gated alerts (reduce noise)

When you check from 3 regions, alerting should follow the same logic.

Good default:

  • Hard alert if 2/3 regions fail (quorum not met)
  • Warning if 1/3 fails (partial rollout, edge issue)

Why this matters:

  • One region can be flaky, or one edge is mid-deploy.
  • Quorum prevents your alert channel from turning into noise.

Pattern 3: Dedupe + cooldown (stop flapping storms)

TLS endpoints can flap during:

  • CDN deployments
  • load balancer changes
  • intermittent network issues

Add a cooldown window:

  • don't send the same alert more than once per X minutes
  • keep emitting state changes (RECOVERED) immediately

Suggested values:

  • 10–15 minutes cooldown for repeating failures
  • 0 cooldown for recovered event

Pattern 4: State machine events (OPEN / UPDATE / RESOLVED)

Instead of sending "ALERT ALERT" repeatedly, model incidents:

Event types:

  • incident.opened
  • incident.updated (optional)
  • incident.resolved

Rules:

  • OPEN when crossing into failing state (quorum broken or expiry threshold crossed)
  • RESOLVE when returning to healthy (quorum met or threshold no longer breached)

Benefits:

  • clean timelines
  • supports Slack threads / Opsgenie incidents / PagerDuty dedupe keys

Pattern 5: Stable dedupe keys (so tools can group)

Always include:

  • dedupeKey (stable across retries)
  • host
  • checkType (tls)
  • reasonCode (TLS_EXPIRED, TLS_EXPIRY_WITHIN_14_DAYS, TLS_HANDSHAKE_FAILED, etc.)

Example dedupe key:

  • tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS

For region-specific warnings:

  • tls:example.com:TLS_HANDSHAKE_FAILED:apac

Pattern 6: Payload that is actually actionable

Minimum payload fields:

  • summary status + quorum
  • daysUntilExpiry + expiresAt
  • issuer + subject
  • per-region statuses (and error message if any)
  • "what changed" diff (if issuer/chain changed)

Example payload shape (conceptual):

{
  "event": "incident.opened",
  "dedupeKey": "tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS",
  "host": "example.com",
  "status": "Degraded",
  "quorum": { "required": 2, "total": 3, "met": true },
  "reason": { "code": "TLS_EXPIRY_WITHIN_14_DAYS", "severity": "warning" },
  "tls": {
    "expiresAt": "2026-01-25T00:00:00Z",
    "daysUntilExpiry": 12,
    "issuer": "R3",
    "protocol": "TLS1.3"
  },
  "regions": [
    { "region": "us", "status": "Healthy" },
    { "region": "eu", "status": "Healthy" },
    { "region": "apac", "status": "Healthy" }
  ],
  "links": { "detailsUrl": "https://netdiag.dev/runs/abc123" }
}

Keep it small; the goal is "triage in 10 seconds."


Pattern 7: Escalation ladder (don't page people for 30-day warnings)

Suggested routing:

  • 30-day expiry → email / low-noise channel
  • 14-day expiry → #ops channel + ticket
  • 7-day expiry → paging (if you're serious)
  • handshake failure (quorum broken) → paging

This prevents alert fatigue.


Pattern 8: Recovery message (teams love closure)

Always send a RESOLVED message:

  • "TLS check recovered (quorum met)."
  • "Renewal deployed; expiry now 89 days."

It builds trust in alerts and helps postmortems.


Pattern 9: Include evidence only when it helps

Evidence is useful for:

  • region mismatch
  • chain change
  • redirect chain to a different hostname

Evidence is noise for:

  • steady-state healthy checks

So: collect evidence in auto mode (only when something is off).


Common webhook mistakes (avoid these)

  • Sending daily "still expiring" spam
  • No dedupe key (alerts never group)
  • No region breakdown (can't debug)
  • No "resolved" event (no closure)
  • Alerting on 1-region blips as critical

What else to monitor beyond expiry

If you're setting up TLS monitoring, expiry is just the starting point. For a comprehensive checklist of what else can break, see What to Monitor Besides SSL Certificate Expiry.


Related articles