Alert Thresholds

Configure when alerts trigger based on monitor performance.

Monitor Thresholds

Failure Threshold

Number of consecutive failures before alert:

Default: 3 consecutive failures
Recommended: 
  - Critical services: 2
  - Normal services: 3
  - Low-priority: 5

Example: Monitor checked every 60s, threshold = 3

Failure at 10:00 (1/3)
Failure at 10:01 (2/3)
Failure at 10:02 (3/3) → ALERT TRIGGERED

Recovery Threshold

Consecutive successes before resolved:

Default: 2 consecutive successes
Recommended:
  - Quick recovery: 1
  - Stable recovery: 2
  - Confirmed recovery: 3

Example: Alert active, threshold = 2

Success at 10:05 (1/2)
Success at 10:06 (2/2) → ALERT RESOLVED

Response Time Thresholds

Static Thresholds

Fixed millisecond values:

// Dashboard → Monitors → [Monitor] → Alert Settings
{
  response_time_warning: 1000,   // 1 second
  response_time_critical: 3000   // 3 seconds
}

Dynamic Thresholds (ML-based)

Auto-adjust based on historical patterns:

Enable: Dashboard → Monitors → [Monitor] → Dynamic Thresholds

Learning period: 7 days
Threshold: Baseline + 2 standard deviations

Benefits:

Adapts to traffic patterns
Fewer false positives
Detects gradual degradation

HTTP Status Thresholds

Expected Status Codes

Monitor Settings → Expected Status → 200

Alert if: 
  - 4xx errors (client errors)
  - 5xx errors (server errors)
  - Unexpected redirects

Error Rate Threshold

Alert if error rate > X% over Y minutes

Example:
  - Error rate > 5% over 5 minutes → Warning
  - Error rate > 10% over 5 minutes → Critical

SSL Certificate Thresholds

Days before expiry to alert:

Default: 30 days

Recommended:
  - 1st alert: 30 days
  - 2nd alert: 14 days  
  - 3rd alert: 7 days
  - Critical: 3 days

Custom Metric Thresholds

For custom metrics (CPU, memory, disk):

Dashboard → Servers → [Server] → Thresholds

CPU Usage:
  Warning: 70%
  Critical: 85%

Memory Usage:
  Warning: 80%
  Critical: 90%

Disk Usage:
  Warning: 75%
  Critical: 85%

Threshold Strategies

Conservative (Few Alerts)

Failure threshold: 5
Recovery threshold: 3
Response time: 3000ms
Error rate: 10%

Use for: Non-critical services, development environments

Balanced (Recommended)

Failure threshold: 3
Recovery threshold: 2
Response time: 1000ms (or dynamic)
Error rate: 5%

Use for: Production services, customer-facing apps

Aggressive (Maximum Uptime)

Failure threshold: 1
Recovery threshold: 1
Response time: 500ms (or dynamic)
Error rate: 1%

Use for: Mission-critical services, SLA-backed APIs

Alert Grouping

Prevent alert spam:

Dashboard → Alert Settings → Grouping

Group by:
  - Monitor group
  - Time window (5 minutes)

Result:
  10 monitors down → 1 grouped alert instead of 10

Escalation Policies

Auto-escalate unacknowledged alerts:

Dashboard → Alert Settings → Escalation

1. Alert → Slack #monitoring (instant)
2. After 10min → Email team lead
3. After 30min → SMS on-call engineer
4. After 1hour → Voice call manager

Testing Thresholds

Simulate failure to test:

Dashboard → Monitors → [Monitor] → Test Alert

Triggers: 
  - Immediate failure
  - Threshold countdown
  - Alert channels activated
  - No actual monitoring affected

Best Practices

✅ DO:

Use dynamic thresholds for variable traffic
Set escalation policies
Test threshold configuration
Review and adjust based on false positives

❌ DON'T:

Set thresholds too low (alert fatigue)
Ignore threshold tuning
Use same thresholds for all monitors
Skip testing

API Configuration

# Update monitor thresholds
curl -X PUT \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"failure_threshold":3,"recovery_threshold":2}' \
  https://statusradar.dev/api/monitors/{id}/thresholds

Next Steps

Anomaly Detection - ML-based alerts
Channels - Configure notifications
Overview - Alert system basics

On this page

Monitor Thresholds
Failure Threshold
Recovery Threshold
Response Time Thresholds
Static Thresholds
Dynamic Thresholds (ML-based)
HTTP Status Thresholds
Expected Status Codes
Error Rate Threshold
SSL Certificate Thresholds
Custom Metric Thresholds
Threshold Strategies
Conservative (Few Alerts)
Balanced (Recommended)
Aggressive (Maximum Uptime)
Alert Grouping
Escalation Policies
Testing Thresholds
Best Practices
API Configuration
Next Steps