Varnish Plugin
Monitor Varnish HTTP cache accelerator with comprehensive metrics covering cache performance, backend health, client connections, and resource usage.
Overview
The Varnish plugin collects detailed metrics from Varnish using varnishstat
including:
- Cache Performance - Hit rate, miss rate, hit-for-pass, hit-for-miss
- Cache Objects - Cached objects, expired objects, LRU nuked objects
- Backend Health - Backend connections, failures, busy count
- Client Metrics - Client connections, requests, dropped connections
- Thread Statistics - Thread count, queue length, failed creations
- Memory Usage - Storage allocated, storage available
Requirements
Varnish Version
- Minimum: Varnish 6.0
- Recommended: Varnish 7.0 or later
- Tested with: Varnish 6.0, 6.6, 7.0, 7.3
Varnish Tools
Required: varnishstat
command-line tool (included with Varnish)
Python Dependencies
No additional Python packages required - uses standard library.
System Access
The agent needs permission to run varnishstat
command.
Configuration
Basic Configuration
plugins:
varnish:
enabled: true
With Custom Instance Name
plugins:
varnish:
enabled: true
instance_name: my_varnish
All Configuration Options
plugins:
varnish:
enabled: true # Enable/disable plugin
instance_name: "" # Varnish instance name (empty = default)
varnishstat_path: /usr/bin/varnishstat # Path to varnishstat
timeout: 10 # Command timeout (seconds)
Environment Variables
Configuration can be overridden with environment variables:
export VARNISH_INSTANCE_NAME="cache1"
Varnish Setup
Installation
Ubuntu/Debian:
# Install Varnish
curl -s https://packagecloud.io/install/repositories/varnishcache/varnish70/script.deb.sh | sudo bash
sudo apt-get install varnish
# Start service
sudo systemctl start varnish
sudo systemctl enable varnish
CentOS/RHEL:
# Install Varnish
curl -s https://packagecloud.io/install/repositories/varnishcache/varnish70/script.rpm.sh | sudo bash
sudo yum install varnish
# Start service
sudo systemctl start varnish
sudo systemctl enable varnish
Basic Configuration
Edit Varnish configuration:
sudo nano /etc/varnish/default.vcl
Simple backend:
vcl 4.0;
backend default {
.host = "127.0.0.1";
.port = "8080";
}
Restart Varnish:
sudo systemctl restart varnish
Grant varnishstat Access
Agent runs as root by default - has access to varnishstat.
For non-root agent:
# Add user to varnish group
sudo usermod -aG varnish statusradar
Collected Metrics
System Metrics
Metric | Description | Unit | Type |
---|---|---|---|
uptime_seconds |
Varnish uptime | Seconds | Gauge |
Cache Performance Metrics
Metric | Description | Unit | Type |
---|---|---|---|
cache_hit |
Cache hits | Count | Counter |
cache_miss |
Cache misses | Count | Counter |
cache_hitpass |
Hit for pass (uncacheable) | Count | Counter |
cache_hit_grace |
Hit for grace | Count | Counter |
cache_hitmiss |
Hit for miss | Count | Counter |
cache_hit_rate_percent |
Cache hit rate percentage | Percent | Gauge |
Hit rate calculation:
cache_hit_rate_percent = (cache_hit / (cache_hit + cache_miss)) × 100
Cache Object Metrics
Metric | Description | Unit | Type |
---|---|---|---|
n_object |
Cached objects | Count | Gauge |
n_objectcore |
Object cores | Count | Gauge |
n_objecthead |
Object heads | Count | Gauge |
n_expired |
Expired objects | Count | Counter |
n_lru_nuked |
LRU nuked objects | Count | Counter |
n_lru_moved |
LRU moved objects | Count | Counter |
Backend Metrics
Metric | Description | Unit | Type |
---|---|---|---|
n_backend |
Number of backends | Count | Gauge |
backend_conn |
Backend connections | Count | Counter |
backend_fail |
Backend connection failures | Count | Counter |
backend_busy |
Backend busy count | Count | Counter |
backend_unhealthy |
Backend unhealthy count | Count | Counter |
backend_reuse |
Backend connection reuses | Count | Counter |
backend_recycle |
Backend connection recycles | Count | Counter |
backend_retry |
Backend retries | Count | Counter |
backend_req |
Backend requests | Count | Counter |
Client Metrics
Metric | Description | Unit | Type |
---|---|---|---|
client_req |
Client requests | Count | Counter |
client_req_400 |
Client 400 errors | Count | Counter |
client_req_417 |
Client 417 errors | Count | Counter |
sess_conn |
Session connections | Count | Counter |
sess_fail |
Session failures | Count | Counter |
sess_dropped |
Dropped sessions | Count | Counter |
Thread Metrics
Metric | Description | Unit | Type |
---|---|---|---|
threads |
Total threads | Count | Gauge |
threads_created |
Threads created | Count | Counter |
threads_destroyed |
Threads destroyed | Count | Counter |
threads_failed |
Thread creation failures | Count | Counter |
threads_limited |
Thread creations limited | Count | Counter |
thread_queue_len |
Thread queue length | Count | Gauge |
Traffic Metrics
Metric | Description | Unit | Type |
---|---|---|---|
req_hdrbytes |
Request header bytes | Bytes | Counter |
req_bodybytes |
Request body bytes | Bytes | Counter |
resp_hdrbytes |
Response header bytes | Bytes | Counter |
resp_bodybytes |
Response body bytes | Bytes | Counter |
req_total_mb |
Total request bytes | MB | Counter |
resp_total_mb |
Total response bytes | MB | Counter |
Fetch Metrics
Metric | Description | Unit | Type |
---|---|---|---|
s_fetch |
Total fetch operations | Count | Counter |
fetch_failed |
Failed fetch operations | Count | Counter |
VCL Metrics
Metric | Description | Unit | Type |
---|---|---|---|
n_vcl |
Number of VCL configurations | Count | Gauge |
vcl_fail |
VCL failures | Count | Counter |
Dashboard Metrics
The StatusRadar dashboard displays:
Overview Card
- Hit Rate - Cache efficiency percentage
- Cached Objects - Current object count
- Backend Health - Connection success rate
- Client Connections - Active clients
Cache Performance Chart
- Hit rate over time
- Miss rate
- Hit-for-pass ratio
Cache Objects Chart
- Cached objects count
- Expired objects
- LRU nuked objects
Backend Health Chart
- Backend connections
- Backend failures
- Backend busy count
Client Traffic Chart
- Client connections
- Client requests
- Dropped connections
Thread Pool Chart
- Active threads
- Thread queue length
- Thread creation failures
Installation
Quick Install
PLUGINS='varnish' \
TOKEN='your-agent-token' \
bash -c "$(curl -sL https://statusradar.dev/install-agent.sh)"
Install on Existing Agent
-
Verify Varnish is installed:
varnishstat -V
-
Enable plugin in config:
sudo nano /opt/statusradar/config/agent.yaml
Add:
plugins: varnish: enabled: true
-
Restart agent:
sudo systemctl restart statusradar-agent
-
Verify:
sudo journalctl -u statusradar-agent -n 50 --no-pager | grep varnish
Expected:
INFO: Plugin varnish: Metrics collected successfully INFO: Plugin varnish: Hit rate 95.5%, 1234 cached objects
Testing
Manual Plugin Test
cd /opt/statusradar
python3 plugins/varnish_plugin.py
Expected Output:
Plugin: varnish
Enabled: True
Available: True
Collecting metrics...
{
"cache_hit": 123456,
"cache_miss": 5678,
"cache_hitrate": 95.6,
"n_object": 1234,
"n_expired": 567,
"n_lru_nuked": 12,
"backend_conn": 89012,
"backend_fail": 23,
"client_conn": 123456,
"client_req": 234567,
"threads": 200
}
Test Varnish Statistics
# View live stats
varnishstat
# View specific counters
varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss
# JSON output
varnishstat -1 -j
Troubleshooting
Plugin Not Collecting Metrics
Check 1: Is Varnish running?
sudo systemctl status varnish
Check 2: Can varnishstat be executed?
varnishstat -1
Check 3: Check agent logs
sudo journalctl -u statusradar-agent -n 100 --no-pager | grep varnish
Common Errors
"varnishstat command not found"
Error:
ERROR: Plugin varnish: varnishstat command not found
Solution:
# Install Varnish
sudo apt-get install varnish
# Or specify full path in config:
varnishstat_path: /usr/bin/varnishstat
"Cannot connect to varnishd"
Error:
ERROR: Plugin varnish: Cannot connect to varnishd
Causes:
- Varnish not running
- Wrong instance name
- Permission issues
Solution:
# Check Varnish is running
sudo systemctl status varnish
# Test varnishstat
sudo varnishstat -1
# Check instance name
sudo varnishadm status
"Permission denied"
Error:
ERROR: Plugin varnish: Permission denied accessing varnish stats
Solution:
Agent should run as root or add to varnish group:
sudo usermod -aG varnish statusradar
sudo systemctl restart statusradar-agent
Performance Impact
On Varnish
Minimal impact:
varnishstat
reads from shared memory- No cache invalidation
- No request processing impact
Benchmark:
- Overhead: < 0.01% CPU
- No measurable performance degradation
On Agent
Resource usage:
- Memory: +10 MB
- CPU: +2% during collection
- Network: +1 KB per collection
Collection time: < 0.2 seconds
Use Cases
1. Cache Hit Rate Monitoring
Monitor:
- Hit rate percentage
- Miss rate trends
- Hit-for-pass ratio
Alert on:
- Hit rate < 80% (poor caching)
- Sudden hit rate drop
- High hit-for-pass rate
2. Backend Health
Monitor:
- Backend connection success rate
- Backend failures
- Backend busy count
Alert on:
- Backend failure rate > 1%
- Backend busy > 0
- Backend unhealthy
3. Cache Efficiency
Monitor:
- Cached objects count
- LRU nuked objects
- Expired objects
Optimize:
- Cache size (malloc storage)
- TTL settings
- VCL logic
4. Client Load
Monitor:
- Client connections
- Request rate
- Dropped connections
Alert on:
- Dropped connections > 0
- Connection rate spike
- Request queue growing
Best Practices
1. Tune Cache Size
Set appropriate cache size:
# /etc/varnish/varnish.params
VARNISH_STORAGE="malloc,4G"
Calculate cache size:
Cache size = (Average object size) × (Number of objects) × 1.2
2. Monitor Hit Rate
Healthy hit rate: > 85%
If hit rate < 80%:
- Increase cache size
- Review TTL settings
- Check VCL logic
- Analyze cache-busting headers
3. Avoid LRU Nuking
LRU nuking indicates:
- Cache size too small
- TTL too long
- Object churn
Solution:
- Increase cache size
- Reduce TTL for large objects
- Implement object size limits
4. Configure Thread Pools
# /etc/varnish/varnish.params
VARNISH_THREAD_POOLS=2
VARNISH_THREAD_POOL_MIN=100
VARNISH_THREAD_POOL_MAX=1000
Formula:
thread_pool_min = Number of CPUs × 50
thread_pool_max = Number of CPUs × 500
5. Enable Logging for Debugging
# View real-time requests
varnishlog
# View specific backend
varnishlog -b
# View specific client
varnishlog -c
Varnish Performance Tuning
Cache Configuration
malloc storage (recommended):
VARNISH_STORAGE="malloc,8G"
file storage:
VARNISH_STORAGE="file,/var/lib/varnish/varnish_storage.bin,10G"
VCL Optimization
Set caching headers:
sub vcl_backend_response {
# Cache for 1 hour
set beresp.ttl = 1h;
# Remove cookies for static files
if (bereq.url ~ "\.(jpg|jpeg|gif|png|css|js)$") {
unset beresp.http.set-cookie;
}
}
Normalize Host header:
sub vcl_recv {
# Normalize host header
set req.http.host = regsub(req.http.host, ":[0-9]+", "");
}
Backend Health Checks
backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = {
.url = "/health";
.timeout = 1s;
.interval = 5s;
.window = 5;
.threshold = 3;
}
}
Advanced Configuration
Multiple Varnish Instances
plugins:
varnish_cache1:
enabled: true
instance_name: cache1
varnish_cache2:
enabled: true
instance_name: cache2
Docker Container
Monitor Varnish in Docker:
plugins:
varnish:
enabled: true
Docker run:
docker run -d --name varnish \
-v /var/run:/var/run \
varnish:latest
Example Configurations
Basic Production
plugins:
varnish:
enabled: true
Multi-Instance
plugins:
varnish_web:
enabled: true
instance_name: web_cache
varnish_api:
enabled: true
instance_name: api_cache
Limitations
Current Limitations
- No per-URL metrics - Only aggregate statistics
- No ban list metrics - Ban/purge statistics not detailed
- No ESI metrics - Edge Side Includes stats not collected
Scalability
Tested with:
- 10,000+ req/sec
- 100,000+ cached objects
- Multi-GB cache sizes
Performance:
- varnishstat overhead constant regardless of cache size
Monitoring Checklist
Critical:
- Hit rate < 80%
- Backend failures > 0
- Dropped connections > 0
- LRU nuking frequent
Important: 5. Thread creation failures 6. Backend busy > 0 7. Thread queue length > 0
Alert Thresholds
cache_hitrate: < 85
backend_fail_rate: > 0.01
client_drop: > 0
n_lru_nuked_rate: > 100/min
Next Steps
- View all plugins
- Nginx Plugin - Often used together
- HAProxy Plugin - Load balancer alternative
- System Requirements
- Overview
- Requirements
- Varnish Version
- Varnish Tools
- Python Dependencies
- System Access
- Configuration
- Basic Configuration
- With Custom Instance Name
- All Configuration Options
- Environment Variables
- Varnish Setup
- Installation
- Basic Configuration
- Grant varnishstat Access
- Collected Metrics
- System Metrics
- Cache Performance Metrics
- Cache Object Metrics
- Backend Metrics
- Client Metrics
- Thread Metrics
- Traffic Metrics
- Fetch Metrics
- VCL Metrics
- Dashboard Metrics
- Overview Card
- Cache Performance Chart
- Cache Objects Chart
- Backend Health Chart
- Client Traffic Chart
- Thread Pool Chart
- Installation
- Quick Install
- Install on Existing Agent
- Testing
- Manual Plugin Test
- Test Varnish Statistics
- Troubleshooting
- Plugin Not Collecting Metrics
- Common Errors
- Performance Impact
- On Varnish
- On Agent
- Use Cases
- 1. Cache Hit Rate Monitoring
- 2. Backend Health
- 3. Cache Efficiency
- 4. Client Load
- Best Practices
- 1. Tune Cache Size
- 2. Monitor Hit Rate
- 3. Avoid LRU Nuking
- 4. Configure Thread Pools
- 5. Enable Logging for Debugging
- Varnish Performance Tuning
- Cache Configuration
- VCL Optimization
- Backend Health Checks
- Advanced Configuration
- Multiple Varnish Instances
- Docker Container
- Example Configurations
- Basic Production
- Multi-Instance
- Limitations
- Current Limitations
- Scalability
- Monitoring Checklist
- Alert Thresholds
- Next Steps