Webhook Alerting Patterns for SSL Expiration and TLS Failures
Practical webhook patterns for SSL/TLS monitoring: dedupe, thresholds, quorum, escalation, and payload design.
Webhooks are the simplest way to turn "a JSON check" into "someone actually gets notified before prod burns."
But naive webhooks spam teams, flap during partial outages, and don't carry enough context to debug quickly.
Here are webhook patterns that work well for SSL expiration and TLS failures — especially when you're checking from multiple regions.
Pattern 1: Threshold alerts (30/14/7 days)
For expiry, use tiers:
- 30 days: planning
- 14 days: action
- 7 days: urgent
Trigger on:
daysUntilExpiry <= threshold
Avoid spamming:
- fire once per threshold crossing
- don't re-fire daily unless you want reminders
Implementation:
- store last-fired threshold per host
- only emit when crossing from
>14to<=14, etc.
Pattern 2: Quorum-gated alerts (reduce noise)
When you check from 3 regions, alerting should follow the same logic.
Good default:
- Hard alert if
2/3 regions fail(quorum not met) - Warning if
1/3 fails(partial rollout, edge issue)
Why this matters:
- One region can be flaky, or one edge is mid-deploy.
- Quorum prevents your alert channel from turning into noise.
Pattern 3: Dedupe + cooldown (stop flapping storms)
TLS endpoints can flap during:
- CDN deployments
- load balancer changes
- intermittent network issues
Add a cooldown window:
- don't send the same alert more than once per X minutes
- keep emitting state changes (RECOVERED) immediately
Suggested values:
- 10–15 minutes cooldown for repeating failures
- 0 cooldown for recovered event
Pattern 4: State machine events (OPEN / UPDATE / RESOLVED)
Instead of sending "ALERT ALERT" repeatedly, model incidents:
Event types:
incident.openedincident.updated(optional)incident.resolved
Rules:
- OPEN when crossing into failing state (quorum broken or expiry threshold crossed)
- RESOLVE when returning to healthy (quorum met or threshold no longer breached)
Benefits:
- clean timelines
- supports Slack threads / Opsgenie incidents / PagerDuty dedupe keys
Pattern 5: Stable dedupe keys (so tools can group)
Always include:
dedupeKey(stable across retries)hostcheckType(tls)reasonCode(TLS_EXPIRED,TLS_EXPIRY_WITHIN_14_DAYS,TLS_HANDSHAKE_FAILED, etc.)
Example dedupe key:
tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS
For region-specific warnings:
tls:example.com:TLS_HANDSHAKE_FAILED:apac
Pattern 6: Payload that is actually actionable
Minimum payload fields:
- summary status + quorum
- daysUntilExpiry + expiresAt
- issuer + subject
- per-region statuses (and error message if any)
- "what changed" diff (if issuer/chain changed)
Example payload shape (conceptual):
{
"event": "incident.opened",
"dedupeKey": "tls:example.com:TLS_EXPIRY_WITHIN_14_DAYS",
"host": "example.com",
"status": "Degraded",
"quorum": { "required": 2, "total": 3, "met": true },
"reason": { "code": "TLS_EXPIRY_WITHIN_14_DAYS", "severity": "warning" },
"tls": {
"expiresAt": "2026-01-25T00:00:00Z",
"daysUntilExpiry": 12,
"issuer": "R3",
"protocol": "TLS1.3"
},
"regions": [
{ "region": "us", "status": "Healthy" },
{ "region": "eu", "status": "Healthy" },
{ "region": "apac", "status": "Healthy" }
],
"links": { "detailsUrl": "https://netdiag.dev/runs/abc123" }
}
Keep it small; the goal is "triage in 10 seconds."
Pattern 7: Escalation ladder (don't page people for 30-day warnings)
Suggested routing:
- 30-day expiry → email / low-noise channel
- 14-day expiry → #ops channel + ticket
- 7-day expiry → paging (if you're serious)
- handshake failure (quorum broken) → paging
This prevents alert fatigue.
Pattern 8: Recovery message (teams love closure)
Always send a RESOLVED message:
- "TLS check recovered (quorum met)."
- "Renewal deployed; expiry now 89 days."
It builds trust in alerts and helps postmortems.
Pattern 9: Include evidence only when it helps
Evidence is useful for:
- region mismatch
- chain change
- redirect chain to a different hostname
Evidence is noise for:
- steady-state healthy checks
So: collect evidence in auto mode (only when something is off).
Common webhook mistakes (avoid these)
- Sending daily "still expiring" spam
- No dedupe key (alerts never group)
- No region breakdown (can't debug)
- No "resolved" event (no closure)
- Alerting on 1-region blips as critical
What else to monitor beyond expiry
If you're setting up TLS monitoring, expiry is just the starting point. For a comprehensive checklist of what else can break, see What to Monitor Besides SSL Certificate Expiry.
Related articles
- What to Monitor Besides SSL Certificate Expiry — issuer, chain, SANs, protocol, and regional mismatches
- Certificate Chain Changes: Why Renewals Sometimes Break Clients — what to detect, why some clients fail, and how to alert safely