Alert Thresholds

Configure when alerts trigger based on monitor performance.

Monitor Thresholds

Failure Threshold

Number of consecutive failures before alert:

Default: 3 consecutive failures
Recommended: 
  - Critical services: 2
  - Normal services: 3
  - Low-priority: 5

Example: Monitor checked every 60s, threshold = 3

  • Failure at 10:00 (1/3)
  • Failure at 10:01 (2/3)
  • Failure at 10:02 (3/3) → ALERT TRIGGERED

Recovery Threshold

Consecutive successes before resolved:

Default: 2 consecutive successes
Recommended:
  - Quick recovery: 1
  - Stable recovery: 2
  - Confirmed recovery: 3

Example: Alert active, threshold = 2

  • Success at 10:05 (1/2)
  • Success at 10:06 (2/2) → ALERT RESOLVED

Response Time Thresholds

Static Thresholds

Fixed millisecond values:

// Dashboard → Monitors → [Monitor] → Alert Settings
{
  response_time_warning: 1000,   // 1 second
  response_time_critical: 3000   // 3 seconds
}

Dynamic Thresholds (ML-based)

Auto-adjust based on historical patterns:

Enable: Dashboard → Monitors → [Monitor] → Dynamic Thresholds

Learning period: 7 days
Threshold: Baseline + 2 standard deviations

Benefits:

  • Adapts to traffic patterns
  • Fewer false positives
  • Detects gradual degradation

HTTP Status Thresholds

Expected Status Codes

Monitor Settings → Expected Status → 200

Alert if: 
  - 4xx errors (client errors)
  - 5xx errors (server errors)
  - Unexpected redirects

Error Rate Threshold

Alert if error rate > X% over Y minutes

Example:
  - Error rate > 5% over 5 minutes → Warning
  - Error rate > 10% over 5 minutes → Critical

SSL Certificate Thresholds

Days before expiry to alert:

Default: 30 days

Recommended:
  - 1st alert: 30 days
  - 2nd alert: 14 days  
  - 3rd alert: 7 days
  - Critical: 3 days

Custom Metric Thresholds

For custom metrics (CPU, memory, disk):

Dashboard → Servers → [Server] → Thresholds

CPU Usage:
  Warning: 70%
  Critical: 85%

Memory Usage:
  Warning: 80%
  Critical: 90%

Disk Usage:
  Warning: 75%
  Critical: 85%

Threshold Strategies

Conservative (Few Alerts)

Failure threshold: 5
Recovery threshold: 3
Response time: 3000ms
Error rate: 10%

Use for: Non-critical services, development environments

Balanced (Recommended)

Failure threshold: 3
Recovery threshold: 2
Response time: 1000ms (or dynamic)
Error rate: 5%

Use for: Production services, customer-facing apps

Aggressive (Maximum Uptime)

Failure threshold: 1
Recovery threshold: 1
Response time: 500ms (or dynamic)
Error rate: 1%

Use for: Mission-critical services, SLA-backed APIs

Alert Grouping

Prevent alert spam:

Dashboard → Alert Settings → Grouping

Group by:
  - Monitor group
  - Time window (5 minutes)

Result:
  10 monitors down → 1 grouped alert instead of 10

Escalation Policies

Auto-escalate unacknowledged alerts:

Dashboard → Alert Settings → Escalation

1. Alert → Slack #monitoring (instant)
2. After 10min → Email team lead
3. After 30min → SMS on-call engineer
4. After 1hour → Voice call manager

Testing Thresholds

Simulate failure to test:

Dashboard → Monitors → [Monitor] → Test Alert

Triggers: 
  - Immediate failure
  - Threshold countdown
  - Alert channels activated
  - No actual monitoring affected

Best Practices

DO:

  • Use dynamic thresholds for variable traffic
  • Set escalation policies
  • Test threshold configuration
  • Review and adjust based on false positives

DON'T:

  • Set thresholds too low (alert fatigue)
  • Ignore threshold tuning
  • Use same thresholds for all monitors
  • Skip testing

API Configuration

# Update monitor thresholds
curl -X PUT \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"failure_threshold":3,"recovery_threshold":2}' \
  https://statusradar.dev/api/monitors/{id}/thresholds

Next Steps