Docker Plugin
Monitor Docker containers and host resources with per-container CPU, memory, network, and disk I/O metrics.
Overview
The Docker plugin collects detailed metrics from Docker Engine including:
- Container Statistics - CPU usage, memory usage, memory limits per container
- Container Count - Running, stopped, paused containers
- Network I/O - Bytes sent/received per container
- Block I/O - Disk read/write operations per container
- Container States - Status, health, restart count
- Image Information - Image sizes, container images
Requirements
Docker Version
- Minimum: Docker 20.10
- Recommended: Docker 23.0 or later
- Tested with: Docker 20.10, 23.0, 24.0, 25.0
Python Dependencies
pip install docker>=6.1.0
Auto-installed when using PLUGINS=docker
during agent installation.
Docker Socket Access
The agent must have access to Docker socket:
# Check socket exists
ls -la /var/run/docker.sock
# Test Docker access
docker ps
Configuration
Basic Configuration
plugins:
docker:
enabled: true
socket: /var/run/docker.sock
With Container Limit
plugins:
docker:
enabled: true
socket: /var/run/docker.sock
max_containers: 50 # Limit monitored containers
Remote Docker Host
plugins:
docker:
enabled: true
base_url: tcp://docker-host.example.com:2376
tls: true
tls_verify: true
tls_ca: /path/to/ca.pem
tls_cert: /path/to/cert.pem
tls_key: /path/to/key.pem
All Configuration Options
plugins:
docker:
enabled: true # Enable/disable plugin
socket: /var/run/docker.sock # Docker socket path
base_url: unix://var/run/docker.sock # Docker API URL
max_containers: 100 # Max containers to monitor
collect_images: true # Collect image metrics
collect_networks: true # Collect network metrics
collect_volumes: true # Collect volume metrics
timeout: 10 # API request timeout
Environment Variables
Configuration can be overridden with environment variables:
export DOCKER_SOCKET="/var/run/docker.sock"
export DOCKER_MAX_CONTAINERS="100"
Docker Setup
Grant Socket Access
The agent runs as root by default and has Docker socket access. No additional setup needed.
For non-root agent (not recommended):
# Add user to docker group
sudo usermod -aG docker statusradar
# Restart agent
sudo systemctl restart statusradar-agent
Security note: Running agent as non-root reduces some monitoring capabilities.
Docker Socket Permissions
Verify socket permissions:
ls -la /var/run/docker.sock
Expected:
srw-rw---- 1 root docker 0 Oct 15 10:00 /var/run/docker.sock
Test Docker API Access
# Test socket access
curl --unix-socket /var/run/docker.sock http://localhost/version
# Using docker CLI
docker info
docker ps
Collected Metrics
Container Count Metrics
Metric | Description | Unit | Type |
---|---|---|---|
containers_total |
Total containers (all states) | Count | Gauge |
containers_running |
Currently running containers | Count | Gauge |
containers_stopped |
Stopped containers | Count | Gauge |
Per-Container Metrics
Each running container in the containers
array includes:
Metric | Description | Unit | Type |
---|---|---|---|
name |
Container name | String | Label |
id |
Container short ID | String | Label |
status |
Container status (running/stopped/etc) | String | Label |
cpu_percent |
CPU usage percentage | Percent | Gauge |
memory_mb |
Current memory usage | MB | Gauge |
memory_percent |
Memory usage percentage | Percent | Gauge |
memory_limit_mb |
Memory limit (cgroup limit) | MB | Gauge |
network_rx_mb |
Total bytes received across all interfaces | MB | Counter |
network_tx_mb |
Total bytes transmitted across all interfaces | MB | Counter |
restarts |
Container restart count | Count | Counter |
CPU percentage calculation:
cpu_percent = (cpu_delta / system_cpu_delta) * num_cpus * 100
Memory percentage:
memory_percent = (memory_usage / memory_limit) * 100
Dashboard Metrics
The StatusRadar dashboard displays:
Overview Card
- Total Containers - Running/stopped/paused counts
- Resource Usage - Aggregated CPU and memory
- Top Containers - Highest resource consumers
Container List
- Container name and ID
- Status and uptime
- CPU and memory usage
- Network and disk I/O
CPU Usage Chart
- Total container CPU usage
- Per-container CPU usage (top 10)
- CPU limits and throttling
Memory Usage Chart
- Total container memory usage
- Per-container memory usage (top 10)
- Memory limits
Network I/O Chart
- Total bytes sent/received
- Per-container network traffic
- Network errors
Block I/O Chart
- Total disk read/write
- Per-container disk I/O
- IOPS
Installation
Quick Install
PLUGINS='docker' \
TOKEN='your-agent-token' \
bash -c "$(curl -sL https://statusradar.dev/install-agent.sh)"
Install on Existing Agent
-
Verify Docker is installed:
docker --version
-
Test Docker access:
docker ps
-
Install Python dependency:
cd /opt/statusradar source venv/bin/activate # If using venv pip install docker
-
Enable plugin in config:
sudo nano /opt/statusradar/config/agent.yaml
Add:
plugins: docker: enabled: true socket: /var/run/docker.sock
-
Restart agent:
sudo systemctl restart statusradar-agent
-
Verify:
sudo journalctl -u statusradar-agent -n 50 --no-pager | grep docker
Expected:
INFO: Plugin docker: Metrics collected successfully INFO: Plugin docker: Monitoring 12 containers
Testing
Manual Plugin Test
cd /opt/statusradar
python3 plugins/docker_plugin.py
Expected Output:
Plugin name: docker
Enabled: True
Available: True
Metrics Summary:
Total containers: 15
Running: 12
Stopped: 3
Running containers:
- nginx-web: CPU 2.5%, Memory 100.0 MB (19.5%), Restarts: 0
- redis-cache: CPU 1.2%, Memory 50.5 MB (9.4%), Restarts: 0
- postgres-db: CPU 3.1%, Memory 256.0 MB (47.6%), Restarts: 0
Test Docker API Access
# List containers
docker ps
# Get container stats
docker stats --no-stream
# Check Docker info
docker info
# Test API via socket
curl --unix-socket /var/run/docker.sock http://localhost/containers/json
Troubleshooting
Plugin Not Collecting Metrics
Check 1: Is Docker running?
sudo systemctl status docker
Check 2: Can agent access Docker socket?
ls -la /var/run/docker.sock
docker ps
Check 3: Is Python package installed?
python3 -c "import docker; print(docker.__version__)"
Check 4: Check agent logs
sudo journalctl -u statusradar-agent -n 100 --no-pager | grep docker
Common Errors
"Permission denied accessing Docker socket"
Error:
ERROR: Plugin docker: Permission denied: '/var/run/docker.sock'
Cause: Agent doesn't have access to Docker socket
Solution:
Agent runs as root by default and should have access. If running as different user:
# Add user to docker group
sudo usermod -aG docker statusradar
# Verify
groups statusradar
# Restart agent
sudo systemctl restart statusradar-agent
"Docker socket not found"
Error:
ERROR: Plugin docker: FileNotFoundError: /var/run/docker.sock
Causes:
- Docker not installed
- Docker not running
- Socket in different location
Solution:
# Install Docker
curl -fsSL https://get.docker.com | sh
# Start Docker
sudo systemctl start docker
sudo systemctl enable docker
# Check socket location
find /var/run -name "docker.sock"
"No module named 'docker'"
Error:
ERROR: No module named 'docker'
Solution:
pip install docker
# Or if using venv:
cd /opt/statusradar && source venv/bin/activate && pip install docker
"Too many containers"
Warning:
WARNING: Plugin docker: Monitoring 150 containers, performance may be affected
Cause: High container count increases collection time and memory usage
Solution:
Limit monitored containers:
plugins:
docker:
enabled: true
max_containers: 50 # Monitor top 50 by CPU usage
High Resource Usage
Symptom: Agent using excessive CPU/memory
Solutions:
-
Reduce collection frequency:
agent: interval: 600 # 10 minutes instead of 5
-
Limit monitored containers:
plugins: docker: max_containers: 25
-
Disable detailed metrics:
plugins: docker: collect_networks: false collect_volumes: false
Performance Impact
On Docker
Minimal impact:
- Plugin uses Docker stats API (same as
docker stats
) - Read-only operations
- No interference with container operations
- Lightweight metrics collection
Benchmark:
- Docker overhead: < 1% CPU per 100 containers
- No measurable performance degradation
On Agent
Resource usage:
- Base (5 containers): +25 MB, +2% CPU
- Medium (25 containers): +50 MB, +5% CPU
- High (100 containers): +150 MB, +15% CPU
- Network: +5 KB per collection
Collection time:
- 10 containers: 0.5 seconds
- 50 containers: 2 seconds
- 100 containers: 4 seconds
Use Cases
1. Container Resource Monitoring
Monitor:
- CPU usage per container
- Memory usage per container
- Resource limits and constraints
Alert on:
- Container using > 80% CPU
- Container approaching memory limit
- CPU throttling
2. Container Sprawl Detection
Monitor:
- Total container count
- Stopped/zombie containers
- Container creation rate
Alert on:
- Too many stopped containers (> 50)
- Container count growing rapidly
3. Network Traffic Analysis
Monitor:
- Network I/O per container
- Network errors
- Bandwidth usage
Alert on:
- High network errors
- Unexpected traffic patterns
- Bandwidth saturation
4. Disk I/O Monitoring
Monitor:
- Disk read/write per container
- IOPS per container
- Block I/O errors
Alert on:
- High disk I/O (I/O bottleneck)
- Excessive disk writes (logging issues)
5. Container Health Tracking
Monitor:
- Container status (running/stopped/failed)
- Restart counts
- Container uptime
Alert on:
- Container restart loops
- Container crashes
- Health check failures
Best Practices
1. Limit Monitored Containers
For hosts with many containers:
plugins:
docker:
max_containers: 50 # Monitor top 50 by resource usage
Benefits:
- Lower agent resource usage
- Faster collection
- Focus on important containers
2. Use Container Labels
Label containers for better organization:
docker run -d \
--label app=web \
--label environment=production \
--label monitor=true \
nginx
3. Set Resource Limits
Always set container resource limits:
# docker-compose.yml
services:
app:
image: myapp
deploy:
resources:
limits:
cpus: '2'
memory: 1G
reservations:
memory: 512M
4. Monitor Container Density
Guideline: Don't overload the host
Max containers = (Host RAM - 2GB) / Average container RAM
Example:
- Host: 16GB RAM
- Average container: 256MB
- Max: (16 - 2) / 0.25 = 56 containers
5. Regular Cleanup
Remove unused containers and images:
# Remove stopped containers
docker container prune -f
# Remove unused images
docker image prune -a -f
# Remove unused volumes
docker volume prune -f
6. Use Health Checks
Define health checks in Dockerfiles:
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost/ || exit 1
Advanced Configuration
Docker over TCP
Monitor remote Docker host:
plugins:
docker:
enabled: true
base_url: tcp://docker-host.example.com:2376
timeout: 15
Docker with TLS
Secure connection to Docker API:
plugins:
docker:
enabled: true
base_url: tcp://docker-host.example.com:2376
tls: true
tls_verify: true
tls_ca: /etc/docker/certs/ca.pem
tls_cert: /etc/docker/certs/cert.pem
tls_key: /etc/docker/certs/key.pem
Container Filtering (Future Feature)
Currently monitors all containers. Future versions will support:
plugins:
docker:
filters:
labels:
- monitor=true
names:
- "prod-*"
Example Configurations
Development Host
plugins:
docker:
enabled: true
socket: /var/run/docker.sock
max_containers: 20
Production Container Host
plugins:
docker:
enabled: true
socket: /var/run/docker.sock
max_containers: 100
collect_images: true
collect_networks: true
Docker Swarm Node
plugins:
docker:
enabled: true
socket: /var/run/docker.sock
max_containers: 200 # Higher limit for swarm
Remote Docker Host
plugins:
docker:
enabled: true
base_url: tcp://10.0.0.5:2376
tls: true
tls_verify: true
tls_ca: /path/to/ca.pem
tls_cert: /path/to/cert.pem
tls_key: /path/to/key.pem
timeout: 20
Limitations
Current Limitations
- No container filtering - Monitors all containers (fixed by max_containers limit)
- No Kubernetes support - Use Kubernetes-specific monitoring for K8s
- No Swarm-specific metrics - Coming in future version
- Performance impact with 100+ containers
Scalability
Recommended limits:
- Small hosts (< 4GB RAM): Max 25 containers monitored
- Medium hosts (4-16GB RAM): Max 50 containers monitored
- Large hosts (> 16GB RAM): Max 100 containers monitored
For larger deployments:
- Use multiple agents
- Deploy agent per Docker host
- Use container orchestration monitoring (Prometheus, etc.)
Troubleshooting Performance
Slow Collection
Symptom: Collection takes > 5 seconds
Solutions:
- Reduce max_containers
- Increase collection interval
- Check Docker daemon performance
Debug:
# Time collection manually
time python3 /opt/statusradar/plugins/docker_plugin.py
# Check Docker API response time
time curl --unix-socket /var/run/docker.sock http://localhost/containers/json
High Memory Usage
Symptom: Agent using > 500MB RAM
Solutions:
- Reduce max_containers
- Disable image/network collection
- Increase swap space
Next Steps
- Overview
- Requirements
- Docker Version
- Python Dependencies
- Docker Socket Access
- Configuration
- Basic Configuration
- With Container Limit
- Remote Docker Host
- All Configuration Options
- Environment Variables
- Docker Setup
- Grant Socket Access
- Docker Socket Permissions
- Test Docker API Access
- Collected Metrics
- Container Count Metrics
- Per-Container Metrics
- Dashboard Metrics
- Overview Card
- Container List
- CPU Usage Chart
- Memory Usage Chart
- Network I/O Chart
- Block I/O Chart
- Installation
- Quick Install
- Install on Existing Agent
- Testing
- Manual Plugin Test
- Test Docker API Access
- Troubleshooting
- Plugin Not Collecting Metrics
- Common Errors
- High Resource Usage
- Performance Impact
- On Docker
- On Agent
- Use Cases
- 1. Container Resource Monitoring
- 2. Container Sprawl Detection
- 3. Network Traffic Analysis
- 4. Disk I/O Monitoring
- 5. Container Health Tracking
- Best Practices
- 1. Limit Monitored Containers
- 2. Use Container Labels
- 3. Set Resource Limits
- 4. Monitor Container Density
- 5. Regular Cleanup
- 6. Use Health Checks
- Advanced Configuration
- Docker over TCP
- Docker with TLS
- Container Filtering (Future Feature)
- Example Configurations
- Development Host
- Production Container Host
- Docker Swarm Node
- Remote Docker Host
- Limitations
- Current Limitations
- Scalability
- Troubleshooting Performance
- Slow Collection
- High Memory Usage
- Next Steps