Alert Thresholds
Configure when alerts trigger based on monitor performance.
Monitor Thresholds
Failure Threshold
Number of consecutive failures before alert:
Default: 3 consecutive failures
Recommended:
- Critical services: 2
- Normal services: 3
- Low-priority: 5
Example: Monitor checked every 60s, threshold = 3
- Failure at 10:00 (1/3)
- Failure at 10:01 (2/3)
- Failure at 10:02 (3/3) → ALERT TRIGGERED
Recovery Threshold
Consecutive successes before resolved:
Default: 2 consecutive successes
Recommended:
- Quick recovery: 1
- Stable recovery: 2
- Confirmed recovery: 3
Example: Alert active, threshold = 2
- Success at 10:05 (1/2)
- Success at 10:06 (2/2) → ALERT RESOLVED
Response Time Thresholds
Static Thresholds
Fixed millisecond values:
// Dashboard → Monitors → [Monitor] → Alert Settings
{
response_time_warning: 1000, // 1 second
response_time_critical: 3000 // 3 seconds
}
Dynamic Thresholds (ML-based)
Auto-adjust based on historical patterns:
Enable: Dashboard → Monitors → [Monitor] → Dynamic Thresholds
Learning period: 7 days
Threshold: Baseline + 2 standard deviations
Benefits:
- Adapts to traffic patterns
- Fewer false positives
- Detects gradual degradation
HTTP Status Thresholds
Expected Status Codes
Monitor Settings → Expected Status → 200
Alert if:
- 4xx errors (client errors)
- 5xx errors (server errors)
- Unexpected redirects
Error Rate Threshold
Alert if error rate > X% over Y minutes
Example:
- Error rate > 5% over 5 minutes → Warning
- Error rate > 10% over 5 minutes → Critical
SSL Certificate Thresholds
Days before expiry to alert:
Default: 30 days
Recommended:
- 1st alert: 30 days
- 2nd alert: 14 days
- 3rd alert: 7 days
- Critical: 3 days
Custom Metric Thresholds
For custom metrics (CPU, memory, disk):
Dashboard → Servers → [Server] → Thresholds
CPU Usage:
Warning: 70%
Critical: 85%
Memory Usage:
Warning: 80%
Critical: 90%
Disk Usage:
Warning: 75%
Critical: 85%
Threshold Strategies
Conservative (Few Alerts)
Failure threshold: 5
Recovery threshold: 3
Response time: 3000ms
Error rate: 10%
Use for: Non-critical services, development environments
Balanced (Recommended)
Failure threshold: 3
Recovery threshold: 2
Response time: 1000ms (or dynamic)
Error rate: 5%
Use for: Production services, customer-facing apps
Aggressive (Maximum Uptime)
Failure threshold: 1
Recovery threshold: 1
Response time: 500ms (or dynamic)
Error rate: 1%
Use for: Mission-critical services, SLA-backed APIs
Alert Grouping
Prevent alert spam:
Dashboard → Alert Settings → Grouping
Group by:
- Monitor group
- Time window (5 minutes)
Result:
10 monitors down → 1 grouped alert instead of 10
Escalation Policies
Auto-escalate unacknowledged alerts:
Dashboard → Alert Settings → Escalation
1. Alert → Slack #monitoring (instant)
2. After 10min → Email team lead
3. After 30min → SMS on-call engineer
4. After 1hour → Voice call manager
Testing Thresholds
Simulate failure to test:
Dashboard → Monitors → [Monitor] → Test Alert
Triggers:
- Immediate failure
- Threshold countdown
- Alert channels activated
- No actual monitoring affected
Best Practices
✅ DO:
- Use dynamic thresholds for variable traffic
- Set escalation policies
- Test threshold configuration
- Review and adjust based on false positives
❌ DON'T:
- Set thresholds too low (alert fatigue)
- Ignore threshold tuning
- Use same thresholds for all monitors
- Skip testing
API Configuration
# Update monitor thresholds
curl -X PUT \
-H "Authorization: Bearer $API_KEY" \
-d '{"failure_threshold":3,"recovery_threshold":2}' \
https://statusradar.dev/api/monitors/{id}/thresholds
Next Steps
- Anomaly Detection - ML-based alerts
- Channels - Configure notifications
- Overview - Alert system basics
On this page
- Monitor Thresholds
- Failure Threshold
- Recovery Threshold
- Response Time Thresholds
- Static Thresholds
- Dynamic Thresholds (ML-based)
- HTTP Status Thresholds
- Expected Status Codes
- Error Rate Threshold
- SSL Certificate Thresholds
- Custom Metric Thresholds
- Threshold Strategies
- Conservative (Few Alerts)
- Balanced (Recommended)
- Aggressive (Maximum Uptime)
- Alert Grouping
- Escalation Policies
- Testing Thresholds
- Best Practices
- API Configuration
- Next Steps