Agent Troubleshooting

This guide helps you diagnose and fix common issues with the StatusRadar Agent.

Quick Diagnostics

Check Agent Status

sudo systemctl status statusradar-agent

Expected output when running:

● statusradar-agent.service - StatusRadar Server Monitoring Agent
   Loaded: loaded
   Active: active (running) since ...

View Recent Logs

sudo journalctl -u statusradar-agent -n 100 --no-pager

Test Agent Manually

# Stop the service
sudo systemctl stop statusradar-agent

# Run manually to see errors
cd /opt/statusradar
python3 statusradar-agent.py

Common Issues

1. Agent Not Starting

Symptoms:

  • Service fails to start
  • systemctl status shows "failed" or "inactive"

Diagnosis:

sudo journalctl -u statusradar-agent -n 50 --no-pager

Common Causes:

Invalid Token

Error in logs:

ERROR: Authentication failed: Invalid token
ERROR: HTTP 401 Unauthorized

Solution:

  1. Verify token in /opt/statusradar/config/agent.yaml
  2. Get correct token from dashboard
  3. Restart agent:
    sudo systemctl restart statusradar-agent

Missing Dependencies

Error in logs:

ModuleNotFoundError: No module named 'requests'
ImportError: No module named 'psutil'

Solution:

cd /opt/statusradar
sudo pip3 install -r requirements.txt
sudo systemctl restart statusradar-agent

Configuration Syntax Error

Error in logs:

yaml.scanner.ScannerError: mapping values are not allowed here

Solution:

  1. Check YAML syntax in /opt/statusradar/config/agent.yaml
  2. Use a YAML validator: https://www.yamllint.com/
  3. Common mistakes:
    • Incorrect indentation (use spaces, not tabs)
    • Missing colons
    • Quotes in strings with special characters
# Wrong
plugins:
  redis:
  enabled: true  # Wrong indentation

# Correct
plugins:
  redis:
    enabled: true

Permission Denied

Error in logs:

PermissionError: [Errno 13] Permission denied: '/opt/statusradar/config/agent.yaml'

Solution:

# Fix file permissions
sudo chown -R root:root /opt/statusradar
sudo chmod 755 /opt/statusradar
sudo chmod 600 /opt/statusradar/config/agent.yaml

# Ensure service runs as root
sudo systemctl restart statusradar-agent

2. Metrics Not Appearing in Dashboard

Symptoms:

  • Agent is running but dashboard shows "No data"
  • Server appears offline
  • Last update time doesn't change

Diagnosis:

Check Agent Logs

sudo journalctl -u statusradar-agent -f

Look for:

INFO: Metrics collected successfully
INFO: Sent metrics to API

Check Network Connectivity

# Test API connectivity
curl -I https://api.statusradar.dev

# Test from agent directory
cd /opt/statusradar
python3 -c "import requests; print(requests.get('https://api.statusradar.dev/health').status_code)"

Expected: 200

Common Causes:

Firewall Blocking Outbound HTTPS

Solution:

# Allow HTTPS outbound (Ubuntu/Debian)
sudo ufw allow out 443/tcp

# Allow HTTPS outbound (CentOS/RHEL)
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload

Wrong API URL

Error in logs:

ERROR: Connection failed: Could not resolve host

Solution: Check /opt/statusradar/config/agent.yaml:

api:
  url: https://api.statusradar.dev  # Correct URL

API Rate Limiting

Error in logs:

ERROR: HTTP 429 Too Many Requests

Solution:

  • Increase collection interval in config (minimum 60 seconds)
  • Contact support if issue persists

Server Already Registered

If you're reinstalling the agent, the server might already exist in the dashboard.

Solution:

  1. Check dashboard for existing server
  2. Use existing token or delete old server first

3. Plugin Not Working

Symptoms:

  • Plugin-specific metrics missing
  • Plugin errors in logs

Diagnosis:

Test Plugin Manually

cd /opt/statusradar
python3 plugins/redis_plugin.py
python3 plugins/mysql_plugin.py
# etc.

Expected output:

Plugin: redis
Enabled: True
Available: True

Collecting metrics...
{
  "used_memory": 1234567,
  "connected_clients": 5,
  ...
}

Common Causes:

Plugin Not Enabled

Solution: Check /opt/statusradar/config/agent.yaml:

plugins:
  redis:
    enabled: true  # Must be true

Service Not Running

Error:

ConnectionRefusedError: [Errno 111] Connection refused

Solution:

# Check if service is running
sudo systemctl status redis
sudo systemctl status mysql
sudo systemctl status postgresql
sudo systemctl status nginx
# etc.

# Start if needed
sudo systemctl start redis

Wrong Connection Parameters

Error:

ERROR: Authentication failed
ERROR: Access denied for user

Solution: Verify connection parameters in config:

plugins:
  mysql:
    host: localhost  # Check hostname
    port: 3306       # Check port
    user: monitor    # Check username
    password: secret # Check password

Missing Permissions

MySQL Example:

-- Grant necessary permissions
GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'monitor'@'localhost';
FLUSH PRIVILEGES;

PostgreSQL Example:

-- Grant pg_monitor role
GRANT pg_monitor TO monitor;

Service Configuration Missing

Nginx Example:

Error:

ERROR: HTTP 404 Not Found

Solution: Enable stub_status in nginx:

location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

Reload nginx:

sudo nginx -t
sudo systemctl reload nginx

4. High Resource Usage

Symptoms:

  • Agent using excessive CPU or memory
  • System slowdown

Diagnosis:

# Check agent resource usage
ps aux | grep statusradar-agent

# Check memory usage
sudo systemctl status statusradar-agent | grep Memory

Solutions:

Reduce Collection Frequency

agent:
  interval: 600  # Increase from 300 to 600 seconds

Disable Unused Plugins

plugins:
  redis:
    enabled: false  # Disable if not needed

Limit Docker Monitoring

For systems with many containers:

plugins:
  docker:
    enabled: true
    max_containers: 50  # Limit monitored containers

5. SSL/TLS Errors

Error in logs:

SSLError: [SSL: CERTIFICATE_VERIFY_FAILED]

Solutions:

Update CA Certificates

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y ca-certificates
sudo update-ca-certificates

# CentOS/RHEL
sudo yum install -y ca-certificates
sudo update-ca-trust

Python SSL Certificate

# Reinstall certifi
sudo pip3 install --upgrade certifi

# Or use system certificates
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt

6. Time Synchronization Issues

Symptoms:

  • Metrics timestamps incorrect
  • Authentication failures

Solution:

# Install NTP
sudo apt-get install -y ntp  # Ubuntu/Debian
sudo yum install -y ntp      # CentOS/RHEL

# Start NTP service
sudo systemctl start ntp
sudo systemctl enable ntp

# Check time sync
timedatectl status

7. Agent Stops After Some Time

Symptoms:

  • Agent runs initially but stops after hours/days
  • No errors in logs before stopping

Diagnosis:

# Check if service auto-restart is enabled
systemctl show statusradar-agent | grep Restart

Expected: Restart=on-failure

Solutions:

Update Systemd Service

Edit /etc/systemd/system/statusradar-agent.service:

[Service]
Restart=on-failure
RestartSec=30

Reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart statusradar-agent

Check System Resources

# Check available memory
free -h

# Check available disk space
df -h /opt/statusradar

Review System Logs

# Check for OOM killer
sudo journalctl -k | grep -i "killed process"

# Check for system errors
sudo journalctl -xe

Log Interpretation

Successful Collection

INFO: Collecting system metrics
INFO: Plugin redis: Metrics collected successfully
INFO: Plugin mysql: Metrics collected successfully
INFO: Sending metrics to API
INFO: Metrics sent successfully

Warning Signs

WARNING: Plugin redis: Connection timeout (retrying)
WARNING: API request took 5.2s (slow network)

These are usually temporary. If persistent, investigate.

Error Signs

ERROR: Authentication failed
ERROR: Plugin mysql: Access denied
ERROR: Cannot connect to API

These require immediate attention.

Advanced Debugging

Enable Debug Logging

Edit agent code or add to config:

import logging
logging.basicConfig(level=logging.DEBUG)

Network Debugging

# Monitor network traffic
sudo tcpdump -i any -nn port 443

# Test DNS resolution
nslookup api.statusradar.dev

# Test with curl
curl -v -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.statusradar.dev/v1/agent/config

Plugin-Specific Debugging

# Test plugin with verbose output
python3 -c "
import sys
sys.path.insert(0, '/opt/statusradar')
from plugins.redis_plugin import RedisPlugin
plugin = RedisPlugin()
print(plugin.collect())
"

Getting Help

If you can't resolve the issue:

  1. Collect Information:

    # System info
    uname -a
    python3 --version
    
    # Agent version
    head -20 /opt/statusradar/statusradar-agent.py | grep VERSION
    
    # Recent logs
    sudo journalctl -u statusradar-agent -n 200 --no-pager > agent-logs.txt
    
    # Configuration (remove sensitive data!)
    cat /opt/statusradar/config/agent.yaml
  2. Contact Support:

    • Email: [email protected]
    • Include: OS version, Python version, error logs, configuration
  3. Community Forums:

Preventive Measures

Regular Maintenance

# Update agent (monthly)
curl -sL https://statusradar.dev/install-agent.sh | sudo bash -s update

# Check logs weekly
sudo journalctl -u statusradar-agent -n 100 --no-pager

# Monitor dashboard daily
# https://statusradar.dev/dashboard/servers

Set Up Monitoring

Create an alert for agent offline:

  1. Go to Dashboard > Alerts
  2. Create alert for "Server Offline"
  3. Get notified when agent stops sending metrics

Backup Configuration

# Backup config file
sudo cp /opt/statusradar/config/agent.yaml \
  /opt/statusradar/config/agent.yaml.backup

Frequently Asked Questions

Q: Can I run multiple agents on the same server?

No. One agent per server. Use plugins to monitor multiple services.

Q: What happens if agent is offline temporarily?

Historical data is preserved. Resume monitoring when agent comes back online.

Q: How do I completely remove the agent?

See Uninstallation Guide.

Q: Can I monitor remote services?

Yes! Configure plugins with remote hostnames:

plugins:
  redis:
    host: remote-server.example.com
    port: 6379

Q: How much bandwidth does the agent use?

~1-5 KB per minute (basic metrics) + 5-10 KB per enabled plugin.

Q: Is it safe to run as root?

Yes. Agent needs root for system metrics. Follow security best practices.

Next Steps