Varnish Plugin

Monitor Varnish HTTP cache accelerator with comprehensive metrics covering cache performance, backend health, client connections, and resource usage.

Overview

The Varnish plugin collects detailed metrics from Varnish using varnishstat including:

  • Cache Performance - Hit rate, miss rate, hit-for-pass, hit-for-miss
  • Cache Objects - Cached objects, expired objects, LRU nuked objects
  • Backend Health - Backend connections, failures, busy count
  • Client Metrics - Client connections, requests, dropped connections
  • Thread Statistics - Thread count, queue length, failed creations
  • Memory Usage - Storage allocated, storage available

Requirements

Varnish Version

  • Minimum: Varnish 6.0
  • Recommended: Varnish 7.0 or later
  • Tested with: Varnish 6.0, 6.6, 7.0, 7.3

Varnish Tools

Required: varnishstat command-line tool (included with Varnish)

Python Dependencies

No additional Python packages required - uses standard library.

System Access

The agent needs permission to run varnishstat command.

Configuration

Basic Configuration

plugins:
  varnish:
    enabled: true

With Custom Instance Name

plugins:
  varnish:
    enabled: true
    instance_name: my_varnish

All Configuration Options

plugins:
  varnish:
    enabled: true                # Enable/disable plugin
    instance_name: ""            # Varnish instance name (empty = default)
    varnishstat_path: /usr/bin/varnishstat  # Path to varnishstat
    timeout: 10                  # Command timeout (seconds)

Environment Variables

Configuration can be overridden with environment variables:

export VARNISH_INSTANCE_NAME="cache1"

Varnish Setup

Installation

Ubuntu/Debian:

# Install Varnish
curl -s https://packagecloud.io/install/repositories/varnishcache/varnish70/script.deb.sh | sudo bash
sudo apt-get install varnish

# Start service
sudo systemctl start varnish
sudo systemctl enable varnish

CentOS/RHEL:

# Install Varnish
curl -s https://packagecloud.io/install/repositories/varnishcache/varnish70/script.rpm.sh | sudo bash
sudo yum install varnish

# Start service
sudo systemctl start varnish
sudo systemctl enable varnish

Basic Configuration

Edit Varnish configuration:

sudo nano /etc/varnish/default.vcl

Simple backend:

vcl 4.0;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

Restart Varnish:

sudo systemctl restart varnish

Grant varnishstat Access

Agent runs as root by default - has access to varnishstat.

For non-root agent:

# Add user to varnish group
sudo usermod -aG varnish statusradar

Collected Metrics

System Metrics

Metric Description Unit Type
uptime_seconds Varnish uptime Seconds Gauge

Cache Performance Metrics

Metric Description Unit Type
cache_hit Cache hits Count Counter
cache_miss Cache misses Count Counter
cache_hitpass Hit for pass (uncacheable) Count Counter
cache_hit_grace Hit for grace Count Counter
cache_hitmiss Hit for miss Count Counter
cache_hit_rate_percent Cache hit rate percentage Percent Gauge

Hit rate calculation:

cache_hit_rate_percent = (cache_hit / (cache_hit + cache_miss)) ร— 100

Cache Object Metrics

Metric Description Unit Type
n_object Cached objects Count Gauge
n_objectcore Object cores Count Gauge
n_objecthead Object heads Count Gauge
n_expired Expired objects Count Counter
n_lru_nuked LRU nuked objects Count Counter
n_lru_moved LRU moved objects Count Counter

Backend Metrics

Metric Description Unit Type
n_backend Number of backends Count Gauge
backend_conn Backend connections Count Counter
backend_fail Backend connection failures Count Counter
backend_busy Backend busy count Count Counter
backend_unhealthy Backend unhealthy count Count Counter
backend_reuse Backend connection reuses Count Counter
backend_recycle Backend connection recycles Count Counter
backend_retry Backend retries Count Counter
backend_req Backend requests Count Counter

Client Metrics

Metric Description Unit Type
client_req Client requests Count Counter
client_req_400 Client 400 errors Count Counter
client_req_417 Client 417 errors Count Counter
sess_conn Session connections Count Counter
sess_fail Session failures Count Counter
sess_dropped Dropped sessions Count Counter

Thread Metrics

Metric Description Unit Type
threads Total threads Count Gauge
threads_created Threads created Count Counter
threads_destroyed Threads destroyed Count Counter
threads_failed Thread creation failures Count Counter
threads_limited Thread creations limited Count Counter
thread_queue_len Thread queue length Count Gauge

Traffic Metrics

Metric Description Unit Type
req_hdrbytes Request header bytes Bytes Counter
req_bodybytes Request body bytes Bytes Counter
resp_hdrbytes Response header bytes Bytes Counter
resp_bodybytes Response body bytes Bytes Counter
req_total_mb Total request bytes MB Counter
resp_total_mb Total response bytes MB Counter

Fetch Metrics

Metric Description Unit Type
s_fetch Total fetch operations Count Counter
fetch_failed Failed fetch operations Count Counter

VCL Metrics

Metric Description Unit Type
n_vcl Number of VCL configurations Count Gauge
vcl_fail VCL failures Count Counter

Dashboard Metrics

The StatusRadar dashboard displays:

Overview Card

  • Hit Rate - Cache efficiency percentage
  • Cached Objects - Current object count
  • Backend Health - Connection success rate
  • Client Connections - Active clients

Cache Performance Chart

  • Hit rate over time
  • Miss rate
  • Hit-for-pass ratio

Cache Objects Chart

  • Cached objects count
  • Expired objects
  • LRU nuked objects

Backend Health Chart

  • Backend connections
  • Backend failures
  • Backend busy count

Client Traffic Chart

  • Client connections
  • Client requests
  • Dropped connections

Thread Pool Chart

  • Active threads
  • Thread queue length
  • Thread creation failures

Installation

Quick Install

PLUGINS='varnish' \
TOKEN='your-agent-token' \
bash -c "$(curl -sL https://statusradar.dev/install-agent.sh)"

Install on Existing Agent

  1. Verify Varnish is installed:

    varnishstat -V
  2. Enable plugin in config:

    sudo nano /opt/statusradar/config/agent.yaml

    Add:

    plugins:
      varnish:
        enabled: true
  3. Restart agent:

    sudo systemctl restart statusradar-agent
  4. Verify:

    sudo journalctl -u statusradar-agent -n 50 --no-pager | grep varnish

    Expected:

    INFO: Plugin varnish: Metrics collected successfully
    INFO: Plugin varnish: Hit rate 95.5%, 1234 cached objects

Testing

Manual Plugin Test

cd /opt/statusradar
python3 plugins/varnish_plugin.py

Expected Output:

Plugin: varnish
Enabled: True
Available: True

Collecting metrics...
{
  "cache_hit": 123456,
  "cache_miss": 5678,
  "cache_hitrate": 95.6,
  "n_object": 1234,
  "n_expired": 567,
  "n_lru_nuked": 12,
  "backend_conn": 89012,
  "backend_fail": 23,
  "client_conn": 123456,
  "client_req": 234567,
  "threads": 200
}

Test Varnish Statistics

# View live stats
varnishstat

# View specific counters
varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss

# JSON output
varnishstat -1 -j

Troubleshooting

Plugin Not Collecting Metrics

Check 1: Is Varnish running?

sudo systemctl status varnish

Check 2: Can varnishstat be executed?

varnishstat -1

Check 3: Check agent logs

sudo journalctl -u statusradar-agent -n 100 --no-pager | grep varnish

Common Errors

"varnishstat command not found"

Error:

ERROR: Plugin varnish: varnishstat command not found

Solution:

# Install Varnish
sudo apt-get install varnish

# Or specify full path in config:
varnishstat_path: /usr/bin/varnishstat

"Cannot connect to varnishd"

Error:

ERROR: Plugin varnish: Cannot connect to varnishd

Causes:

  1. Varnish not running
  2. Wrong instance name
  3. Permission issues

Solution:

# Check Varnish is running
sudo systemctl status varnish

# Test varnishstat
sudo varnishstat -1

# Check instance name
sudo varnishadm status

"Permission denied"

Error:

ERROR: Plugin varnish: Permission denied accessing varnish stats

Solution:

Agent should run as root or add to varnish group:

sudo usermod -aG varnish statusradar
sudo systemctl restart statusradar-agent

Performance Impact

On Varnish

Minimal impact:

  • varnishstat reads from shared memory
  • No cache invalidation
  • No request processing impact

Benchmark:

  • Overhead: < 0.01% CPU
  • No measurable performance degradation

On Agent

Resource usage:

  • Memory: +10 MB
  • CPU: +2% during collection
  • Network: +1 KB per collection

Collection time: < 0.2 seconds

Use Cases

1. Cache Hit Rate Monitoring

Monitor:

  • Hit rate percentage
  • Miss rate trends
  • Hit-for-pass ratio

Alert on:

  • Hit rate < 80% (poor caching)
  • Sudden hit rate drop
  • High hit-for-pass rate

2. Backend Health

Monitor:

  • Backend connection success rate
  • Backend failures
  • Backend busy count

Alert on:

  • Backend failure rate > 1%
  • Backend busy > 0
  • Backend unhealthy

3. Cache Efficiency

Monitor:

  • Cached objects count
  • LRU nuked objects
  • Expired objects

Optimize:

  • Cache size (malloc storage)
  • TTL settings
  • VCL logic

4. Client Load

Monitor:

  • Client connections
  • Request rate
  • Dropped connections

Alert on:

  • Dropped connections > 0
  • Connection rate spike
  • Request queue growing

Best Practices

1. Tune Cache Size

Set appropriate cache size:

# /etc/varnish/varnish.params
VARNISH_STORAGE="malloc,4G"

Calculate cache size:

Cache size = (Average object size) ร— (Number of objects) ร— 1.2

2. Monitor Hit Rate

Healthy hit rate: > 85%

If hit rate < 80%:

  • Increase cache size
  • Review TTL settings
  • Check VCL logic
  • Analyze cache-busting headers

3. Avoid LRU Nuking

LRU nuking indicates:

  • Cache size too small
  • TTL too long
  • Object churn

Solution:

  • Increase cache size
  • Reduce TTL for large objects
  • Implement object size limits

4. Configure Thread Pools

# /etc/varnish/varnish.params
VARNISH_THREAD_POOLS=2
VARNISH_THREAD_POOL_MIN=100
VARNISH_THREAD_POOL_MAX=1000

Formula:

thread_pool_min = Number of CPUs ร— 50
thread_pool_max = Number of CPUs ร— 500

5. Enable Logging for Debugging

# View real-time requests
varnishlog

# View specific backend
varnishlog -b

# View specific client
varnishlog -c

Varnish Performance Tuning

Cache Configuration

malloc storage (recommended):

VARNISH_STORAGE="malloc,8G"

file storage:

VARNISH_STORAGE="file,/var/lib/varnish/varnish_storage.bin,10G"

VCL Optimization

Set caching headers:

sub vcl_backend_response {
    # Cache for 1 hour
    set beresp.ttl = 1h;

    # Remove cookies for static files
    if (bereq.url ~ "\.(jpg|jpeg|gif|png|css|js)$") {
        unset beresp.http.set-cookie;
    }
}

Normalize Host header:

sub vcl_recv {
    # Normalize host header
    set req.http.host = regsub(req.http.host, ":[0-9]+", "");
}

Backend Health Checks

backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/health";
        .timeout = 1s;
        .interval = 5s;
        .window = 5;
        .threshold = 3;
    }
}

Advanced Configuration

Multiple Varnish Instances

plugins:
  varnish_cache1:
    enabled: true
    instance_name: cache1

  varnish_cache2:
    enabled: true
    instance_name: cache2

Docker Container

Monitor Varnish in Docker:

plugins:
  varnish:
    enabled: true

Docker run:

docker run -d --name varnish \
  -v /var/run:/var/run \
  varnish:latest

Example Configurations

Basic Production

plugins:
  varnish:
    enabled: true

Multi-Instance

plugins:
  varnish_web:
    enabled: true
    instance_name: web_cache

  varnish_api:
    enabled: true
    instance_name: api_cache

Limitations

Current Limitations

  1. No per-URL metrics - Only aggregate statistics
  2. No ban list metrics - Ban/purge statistics not detailed
  3. No ESI metrics - Edge Side Includes stats not collected

Scalability

Tested with:

  • 10,000+ req/sec
  • 100,000+ cached objects
  • Multi-GB cache sizes

Performance:

  • varnishstat overhead constant regardless of cache size

Monitoring Checklist

Critical:

  1. Hit rate < 80%
  2. Backend failures > 0
  3. Dropped connections > 0
  4. LRU nuking frequent

Important: 5. Thread creation failures 6. Backend busy > 0 7. Thread queue length > 0

Alert Thresholds

cache_hitrate: < 85
backend_fail_rate: > 0.01
client_drop: > 0
n_lru_nuked_rate: > 100/min

Next Steps