| Enabled | Whether this rule creates alarms. Disabled rules ignore matching events. |
| Name | Unique rule identifier (e.g., scanner-down, tls-expiring) |
| Description | Human-readable explanation of what the rule detects |
| Severity | Current severity for new alarms. Can be overridden from the default. |
| Auto-Resolve | Event type that auto-resolves matching alarms (null = manual only) |
| scanner-down | Availability: fires on scanner_down (default: high), auto-resolves on scanner_up. Scanner missed heartbeat timeout (2 minutes). |
| ingest-down | Availability: fires on ingest_down (default: critical), auto-resolves on ingest_up. Ingest API node stopped heartbeating. |
| db-down | Availability: fires on db_down (default: critical), auto-resolves on db_up. Database connection pool test failed. |
| proxy-down | Availability: fires on proxy_down (default: critical), auto-resolves on proxy_up. Traefik reverse proxy health check failed. |
| dns-down | Availability: fires on dns_down (default: high), auto-resolves on dns_up. Public FQDN DNS resolution failed. |
| backup-failed | Operations: fires on backup_failed (default: high), manual resolve only. Manual or scheduled backup completed with errors. |
| db-disk-high | Threshold: fires on db_disk_high (default: warning), auto-resolves on db_disk_normal. Database disk usage exceeded threshold. |
| db-connections-high | Threshold: fires on db_connections_high (default: warning), auto-resolves on db_connections_normal. Database connection pool has sustained waiting queries. |
| tls-expiring | Deadline: fires on tls_expiring (default: warning), auto-resolves on tls_renewed. TLS certificate expires within 30 days. |
| scanner-load-high | Resource: fires on scanner_load_high (default: warning), auto-resolves on scanner_load_normal. Scanner load average exceeds 80% of CPU capacity. |
| scanner-memory-high | Resource: fires on scanner_memory_high (default: warning), auto-resolves on scanner_memory_normal. Scanner available memory below 10%. |
| db-sessions-high | Database: fires on db_sessions_high (default: warning), auto-resolves on db_sessions_normal. Active database sessions exceed 80% of max_connections. |
| db-long-queries | Database: fires on db_long_queries_high (default: warning), auto-resolves on db_long_queries_normal. Database query running longer than 60 seconds. |
| auth-failures-high | Security: fires on auth_failures_high (default: high), auto-resolves on auth_failures_normal. More than 10 authentication failures in 5 minutes. |
| session-ip-spread | Security: fires on session_ip_spread_high (default: warning), manual resolve only. One user account has active sessions from too many distinct IP addresses. |
| scan-stuck | Stall: fires on scan_stuck (default: warning), manual resolve only. Scan still running but all jobs are finished. |
| scanner-heartbeat-failed | Security: fires on scanner_heartbeat_failed (default: warning), manual resolve only. Scanner heartbeat authentication failed — invalid API key or IP binding violation. Requires manual acknowledgement; a scanner can resume heartbeating successfully while prior auth failures remain in scope. |
| dispatcher-down | Availability: fires on dispatcher_down (default: critical), auto-resolves on dispatcher_up. Dispatcher service stopped heartbeating for more than 2 minutes. When the dispatcher is down, no new scan jobs are dispatched and active scans stall. |
| backup-down | Availability: fires on backup_down (default: warning), auto-resolves on backup_up. Backup container is unreachable via DNS resolution. Typically indicates the backup container has stopped or failed to start. |
| manager-memory-high | Resource: fires on manager_memory_high (default: warning), auto-resolves on manager_memory_normal. Manager heap usage exceeded 80% of the 512 MB container memory limit. Sustained high memory may precede an OOM termination; consider restarting the manager or investigating large in-flight requests. |
| container-recovery-failed | Infrastructure: fires on container_recovery_failed (default: critical), manual resolve only. A container did not recover within 2 minutes after the watchdog attempted a restart. Indicates a persistent crash loop or configuration error that automatic healing cannot fix. |
| container-restart-storm | Infrastructure: fires on container_restart_storm (default: high), manual resolve only. A container restarted more than 3 times within a 15-minute window. Suggests a crash loop; investigate container logs before allowing further restarts. |
| container-down | Infrastructure: fires on container_down (default: high), auto-resolves on container_up. Container is running but not reachable over the Docker internal network — health check connections are refused or timing out. |
| container-restarted | Infrastructure: fires on container_restarted (default: info), manual resolve only. Container was restarted by the watchdog autoheal mechanism. Informational — the restart itself is the resolution of the underlying fault. Review logs if restarts become frequent. |