Docker Plugin

Monitor Docker containers and host resources with per-container CPU, memory, network, and disk I/O metrics.

Overview

The Docker plugin collects detailed metrics from Docker Engine including:

  • Container Statistics - CPU usage, memory usage, memory limits per container
  • Container Count - Running, stopped, paused containers
  • Network I/O - Bytes sent/received per container
  • Block I/O - Disk read/write operations per container
  • Container States - Status, health, restart count
  • Image Information - Image sizes, container images

Requirements

Docker Version

  • Minimum: Docker 20.10
  • Recommended: Docker 23.0 or later
  • Tested with: Docker 20.10, 23.0, 24.0, 25.0

Python Dependencies

pip install docker>=6.1.0

Auto-installed when using PLUGINS=docker during agent installation.

Docker Socket Access

The agent must have access to Docker socket:

# Check socket exists
ls -la /var/run/docker.sock

# Test Docker access
docker ps

Configuration

Basic Configuration

plugins:
  docker:
    enabled: true
    socket: /var/run/docker.sock

With Container Limit

plugins:
  docker:
    enabled: true
    socket: /var/run/docker.sock
    max_containers: 50  # Limit monitored containers

Remote Docker Host

plugins:
  docker:
    enabled: true
    base_url: tcp://docker-host.example.com:2376
    tls: true
    tls_verify: true
    tls_ca: /path/to/ca.pem
    tls_cert: /path/to/cert.pem
    tls_key: /path/to/key.pem

All Configuration Options

plugins:
  docker:
    enabled: true                           # Enable/disable plugin
    socket: /var/run/docker.sock            # Docker socket path
    base_url: unix://var/run/docker.sock    # Docker API URL
    max_containers: 100                     # Max containers to monitor
    collect_images: true                    # Collect image metrics
    collect_networks: true                  # Collect network metrics
    collect_volumes: true                   # Collect volume metrics
    timeout: 10                             # API request timeout

Environment Variables

Configuration can be overridden with environment variables:

export DOCKER_SOCKET="/var/run/docker.sock"
export DOCKER_MAX_CONTAINERS="100"

Docker Setup

Grant Socket Access

The agent runs as root by default and has Docker socket access. No additional setup needed.

For non-root agent (not recommended):

# Add user to docker group
sudo usermod -aG docker statusradar

# Restart agent
sudo systemctl restart statusradar-agent

Security note: Running agent as non-root reduces some monitoring capabilities.

Docker Socket Permissions

Verify socket permissions:

ls -la /var/run/docker.sock

Expected:

srw-rw---- 1 root docker 0 Oct 15 10:00 /var/run/docker.sock

Test Docker API Access

# Test socket access
curl --unix-socket /var/run/docker.sock http://localhost/version

# Using docker CLI
docker info
docker ps

Collected Metrics

Container Count Metrics

Metric Description Unit Type
containers_total Total containers (all states) Count Gauge
containers_running Currently running containers Count Gauge
containers_stopped Stopped containers Count Gauge

Per-Container Metrics

Each running container in the containers array includes:

Metric Description Unit Type
name Container name String Label
id Container short ID String Label
status Container status (running/stopped/etc) String Label
cpu_percent CPU usage percentage Percent Gauge
memory_mb Current memory usage MB Gauge
memory_percent Memory usage percentage Percent Gauge
memory_limit_mb Memory limit (cgroup limit) MB Gauge
network_rx_mb Total bytes received across all interfaces MB Counter
network_tx_mb Total bytes transmitted across all interfaces MB Counter
restarts Container restart count Count Counter

CPU percentage calculation:

cpu_percent = (cpu_delta / system_cpu_delta) * num_cpus * 100

Memory percentage:

memory_percent = (memory_usage / memory_limit) * 100

Dashboard Metrics

The StatusRadar dashboard displays:

Overview Card

  • Total Containers - Running/stopped/paused counts
  • Resource Usage - Aggregated CPU and memory
  • Top Containers - Highest resource consumers

Container List

  • Container name and ID
  • Status and uptime
  • CPU and memory usage
  • Network and disk I/O

CPU Usage Chart

  • Total container CPU usage
  • Per-container CPU usage (top 10)
  • CPU limits and throttling

Memory Usage Chart

  • Total container memory usage
  • Per-container memory usage (top 10)
  • Memory limits

Network I/O Chart

  • Total bytes sent/received
  • Per-container network traffic
  • Network errors

Block I/O Chart

  • Total disk read/write
  • Per-container disk I/O
  • IOPS

Installation

Quick Install

PLUGINS='docker' \
TOKEN='your-agent-token' \
bash -c "$(curl -sL https://statusradar.dev/install-agent.sh)"

Install on Existing Agent

  1. Verify Docker is installed:

    docker --version
  2. Test Docker access:

    docker ps
  3. Install Python dependency:

    cd /opt/statusradar
    source venv/bin/activate  # If using venv
    pip install docker
  4. Enable plugin in config:

    sudo nano /opt/statusradar/config/agent.yaml

    Add:

    plugins:
      docker:
        enabled: true
        socket: /var/run/docker.sock
  5. Restart agent:

    sudo systemctl restart statusradar-agent
  6. Verify:

    sudo journalctl -u statusradar-agent -n 50 --no-pager | grep docker

    Expected:

    INFO: Plugin docker: Metrics collected successfully
    INFO: Plugin docker: Monitoring 12 containers

Testing

Manual Plugin Test

cd /opt/statusradar
python3 plugins/docker_plugin.py

Expected Output:

Plugin name: docker
Enabled: True
Available: True

Metrics Summary:
  Total containers: 15
  Running: 12
  Stopped: 3

Running containers:
  - nginx-web: CPU 2.5%, Memory 100.0 MB (19.5%), Restarts: 0
  - redis-cache: CPU 1.2%, Memory 50.5 MB (9.4%), Restarts: 0
  - postgres-db: CPU 3.1%, Memory 256.0 MB (47.6%), Restarts: 0

Test Docker API Access

# List containers
docker ps

# Get container stats
docker stats --no-stream

# Check Docker info
docker info

# Test API via socket
curl --unix-socket /var/run/docker.sock http://localhost/containers/json

Troubleshooting

Plugin Not Collecting Metrics

Check 1: Is Docker running?

sudo systemctl status docker

Check 2: Can agent access Docker socket?

ls -la /var/run/docker.sock
docker ps

Check 3: Is Python package installed?

python3 -c "import docker; print(docker.__version__)"

Check 4: Check agent logs

sudo journalctl -u statusradar-agent -n 100 --no-pager | grep docker

Common Errors

"Permission denied accessing Docker socket"

Error:

ERROR: Plugin docker: Permission denied: '/var/run/docker.sock'

Cause: Agent doesn't have access to Docker socket

Solution:

Agent runs as root by default and should have access. If running as different user:

# Add user to docker group
sudo usermod -aG docker statusradar

# Verify
groups statusradar

# Restart agent
sudo systemctl restart statusradar-agent

"Docker socket not found"

Error:

ERROR: Plugin docker: FileNotFoundError: /var/run/docker.sock

Causes:

  1. Docker not installed
  2. Docker not running
  3. Socket in different location

Solution:

# Install Docker
curl -fsSL https://get.docker.com | sh

# Start Docker
sudo systemctl start docker
sudo systemctl enable docker

# Check socket location
find /var/run -name "docker.sock"

"No module named 'docker'"

Error:

ERROR: No module named 'docker'

Solution:

pip install docker
# Or if using venv:
cd /opt/statusradar && source venv/bin/activate && pip install docker

"Too many containers"

Warning:

WARNING: Plugin docker: Monitoring 150 containers, performance may be affected

Cause: High container count increases collection time and memory usage

Solution:

Limit monitored containers:

plugins:
  docker:
    enabled: true
    max_containers: 50  # Monitor top 50 by CPU usage

High Resource Usage

Symptom: Agent using excessive CPU/memory

Solutions:

  1. Reduce collection frequency:

    agent:
      interval: 600  # 10 minutes instead of 5
  2. Limit monitored containers:

    plugins:
      docker:
        max_containers: 25
  3. Disable detailed metrics:

    plugins:
      docker:
        collect_networks: false
        collect_volumes: false

Performance Impact

On Docker

Minimal impact:

  • Plugin uses Docker stats API (same as docker stats)
  • Read-only operations
  • No interference with container operations
  • Lightweight metrics collection

Benchmark:

  • Docker overhead: < 1% CPU per 100 containers
  • No measurable performance degradation

On Agent

Resource usage:

  • Base (5 containers): +25 MB, +2% CPU
  • Medium (25 containers): +50 MB, +5% CPU
  • High (100 containers): +150 MB, +15% CPU
  • Network: +5 KB per collection

Collection time:

  • 10 containers: 0.5 seconds
  • 50 containers: 2 seconds
  • 100 containers: 4 seconds

Use Cases

1. Container Resource Monitoring

Monitor:

  • CPU usage per container
  • Memory usage per container
  • Resource limits and constraints

Alert on:

  • Container using > 80% CPU
  • Container approaching memory limit
  • CPU throttling

2. Container Sprawl Detection

Monitor:

  • Total container count
  • Stopped/zombie containers
  • Container creation rate

Alert on:

  • Too many stopped containers (> 50)
  • Container count growing rapidly

3. Network Traffic Analysis

Monitor:

  • Network I/O per container
  • Network errors
  • Bandwidth usage

Alert on:

  • High network errors
  • Unexpected traffic patterns
  • Bandwidth saturation

4. Disk I/O Monitoring

Monitor:

  • Disk read/write per container
  • IOPS per container
  • Block I/O errors

Alert on:

  • High disk I/O (I/O bottleneck)
  • Excessive disk writes (logging issues)

5. Container Health Tracking

Monitor:

  • Container status (running/stopped/failed)
  • Restart counts
  • Container uptime

Alert on:

  • Container restart loops
  • Container crashes
  • Health check failures

Best Practices

1. Limit Monitored Containers

For hosts with many containers:

plugins:
  docker:
    max_containers: 50  # Monitor top 50 by resource usage

Benefits:

  • Lower agent resource usage
  • Faster collection
  • Focus on important containers

2. Use Container Labels

Label containers for better organization:

docker run -d \
  --label app=web \
  --label environment=production \
  --label monitor=true \
  nginx

3. Set Resource Limits

Always set container resource limits:

# docker-compose.yml
services:
  app:
    image: myapp
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 1G
        reservations:
          memory: 512M

4. Monitor Container Density

Guideline: Don't overload the host

Max containers = (Host RAM - 2GB) / Average container RAM

Example:

  • Host: 16GB RAM
  • Average container: 256MB
  • Max: (16 - 2) / 0.25 = 56 containers

5. Regular Cleanup

Remove unused containers and images:

# Remove stopped containers
docker container prune -f

# Remove unused images
docker image prune -a -f

# Remove unused volumes
docker volume prune -f

6. Use Health Checks

Define health checks in Dockerfiles:

HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

Advanced Configuration

Docker over TCP

Monitor remote Docker host:

plugins:
  docker:
    enabled: true
    base_url: tcp://docker-host.example.com:2376
    timeout: 15

Docker with TLS

Secure connection to Docker API:

plugins:
  docker:
    enabled: true
    base_url: tcp://docker-host.example.com:2376
    tls: true
    tls_verify: true
    tls_ca: /etc/docker/certs/ca.pem
    tls_cert: /etc/docker/certs/cert.pem
    tls_key: /etc/docker/certs/key.pem

Container Filtering (Future Feature)

Currently monitors all containers. Future versions will support:

plugins:
  docker:
    filters:
      labels:
        - monitor=true
      names:
        - "prod-*"

Example Configurations

Development Host

plugins:
  docker:
    enabled: true
    socket: /var/run/docker.sock
    max_containers: 20

Production Container Host

plugins:
  docker:
    enabled: true
    socket: /var/run/docker.sock
    max_containers: 100
    collect_images: true
    collect_networks: true

Docker Swarm Node

plugins:
  docker:
    enabled: true
    socket: /var/run/docker.sock
    max_containers: 200  # Higher limit for swarm

Remote Docker Host

plugins:
  docker:
    enabled: true
    base_url: tcp://10.0.0.5:2376
    tls: true
    tls_verify: true
    tls_ca: /path/to/ca.pem
    tls_cert: /path/to/cert.pem
    tls_key: /path/to/key.pem
    timeout: 20

Limitations

Current Limitations

  1. No container filtering - Monitors all containers (fixed by max_containers limit)
  2. No Kubernetes support - Use Kubernetes-specific monitoring for K8s
  3. No Swarm-specific metrics - Coming in future version
  4. Performance impact with 100+ containers

Scalability

Recommended limits:

  • Small hosts (< 4GB RAM): Max 25 containers monitored
  • Medium hosts (4-16GB RAM): Max 50 containers monitored
  • Large hosts (> 16GB RAM): Max 100 containers monitored

For larger deployments:

  • Use multiple agents
  • Deploy agent per Docker host
  • Use container orchestration monitoring (Prometheus, etc.)

Troubleshooting Performance

Slow Collection

Symptom: Collection takes > 5 seconds

Solutions:

  1. Reduce max_containers
  2. Increase collection interval
  3. Check Docker daemon performance

Debug:

# Time collection manually
time python3 /opt/statusradar/plugins/docker_plugin.py

# Check Docker API response time
time curl --unix-socket /var/run/docker.sock http://localhost/containers/json

High Memory Usage

Symptom: Agent using > 500MB RAM

Solutions:

  1. Reduce max_containers
  2. Disable image/network collection
  3. Increase swap space

Next Steps