Alarm Rules

Built-in alarm rules ship pre-configured with mipo and cover scanner health, infrastructure liveness, and core-service availability. Each rule has a trigger event, severity, and default routing — operators can override severity, mute, or change notification channels per rule from the Alarm Rules admin page. Rules cannot be created or deleted — they are code-defined and seeded automatically on every boot.

Fields & Columns

Name	Description
Enabled	Whether this rule creates alarms. Disabled rules ignore matching events.
Name	Unique rule identifier (e.g., scanner-down, tls-expiring)
Description	Human-readable explanation of what the rule detects
Severity	Current severity for new alarms. Can be overridden from the default.
Auto-Resolve	Event type that auto-resolves matching alarms (null = manual only)
scanner-down	Availability: fires on scanner_down (default: high), auto-resolves on scanner_up. Scanner missed heartbeat timeout (2 minutes).
ingest-down	Availability: fires on ingest_down (default: critical), auto-resolves on ingest_up. Ingest API node stopped heartbeating.
db-down	Availability: fires on db_down (default: critical), auto-resolves on db_up. Database connection pool test failed.
proxy-down	Availability: fires on proxy_down (default: critical), auto-resolves on proxy_up. Traefik reverse proxy health check failed.
dns-down	Availability: fires on dns_down (default: high), auto-resolves on dns_up. Public FQDN DNS resolution failed.
backup-failed	Operations: fires on backup_failed (default: high), manual resolve only. Manual or scheduled backup completed with errors.
db-disk-high	Threshold: fires on db_disk_high (default: warning), auto-resolves on db_disk_normal. Database disk usage exceeded threshold.
db-connections-high	Threshold: fires on db_connections_high (default: warning), auto-resolves on db_connections_normal. Database connection pool has sustained waiting queries.
tls-expiring	Deadline: fires on tls_expiring (default: warning), auto-resolves on tls_renewed. TLS certificate expires within 30 days.
scanner-load-high	Resource: fires on scanner_load_high (default: warning), auto-resolves on scanner_load_normal. Scanner load average exceeds 80% of CPU capacity.
scanner-memory-high	Resource: fires on scanner_memory_high (default: warning), auto-resolves on scanner_memory_normal. Scanner available memory below 10%.
db-sessions-high	Database: fires on db_sessions_high (default: warning), auto-resolves on db_sessions_normal. Active database sessions exceed 80% of max_connections.
db-long-queries	Database: fires on db_long_queries_high (default: warning), auto-resolves on db_long_queries_normal. Database query running longer than 60 seconds.
auth-failures-high	Security: fires on auth_failures_high (default: high), auto-resolves on auth_failures_normal. More than 10 authentication failures in 5 minutes.
session-ip-spread	Security: fires on session_ip_spread_high (default: warning), manual resolve only. One user account has active sessions from too many distinct IP addresses.
scan-stuck	Stall: fires on scan_stuck (default: warning), manual resolve only. Scan still running but all jobs are finished.
scanner-heartbeat-failed	Security: fires on scanner_heartbeat_failed (default: warning), manual resolve only. Scanner heartbeat authentication failed — invalid API key or IP binding violation. Requires manual acknowledgement; a scanner can resume heartbeating successfully while prior auth failures remain in scope.
dispatcher-down	Availability: fires on dispatcher_down (default: critical), auto-resolves on dispatcher_up. Dispatcher service stopped heartbeating for more than 2 minutes. When the dispatcher is down, no new scan jobs are dispatched and active scans stall.
backup-down	Availability: fires on backup_down (default: warning), auto-resolves on backup_up. Backup container is unreachable via DNS resolution. Typically indicates the backup container has stopped or failed to start.
manager-memory-high	Resource: fires on manager_memory_high (default: warning), auto-resolves on manager_memory_normal. Manager heap usage exceeded 80% of the 512 MB container memory limit. Sustained high memory may precede an OOM termination; consider restarting the manager or investigating large in-flight requests.
container-recovery-failed	Infrastructure: fires on container_recovery_failed (default: critical), manual resolve only. A container did not recover within 2 minutes after the watchdog attempted a restart. Indicates a persistent crash loop or configuration error that automatic healing cannot fix.
container-restart-storm	Infrastructure: fires on container_restart_storm (default: high), manual resolve only. A container restarted more than 3 times within a 15-minute window. Suggests a crash loop; investigate container logs before allowing further restarts.
container-down	Infrastructure: fires on container_down (default: high), auto-resolves on container_up. Container is running but not reachable over the Docker internal network — health check connections are refused or timing out.
container-restarted	Infrastructure: fires on container_restarted (default: info), manual resolve only. Container was restarted by the watchdog autoheal mechanism. Informational — the restart itself is the resolution of the underlying fault. Review logs if restarts become frequent.

How To

Disable an alarm type

Find the rule in the table
Click the enabled toggle to turn it off
New events of this type will be ignored (existing alarms remain)

Override severity

Find the rule in the table
Use the severity dropdown to change the level
New alarms will use the overridden severity

Reset severity to default

Find the rule with a non-default severity (shown in parentheses)
Click the "Reset" button
Severity reverts to the code-defined default

Gotchas

Disabling a rule does not resolve existing alarms — it only prevents new ones.
Severity overrides apply to new alarms only. Existing alarms keep their original severity.
Rules are re-seeded on boot. New rules appear automatically after a code update.
The default severity is immutable — it reflects the code-defined importance of the fault.

API Calls (3)

Method	Path	Description
GET	/api/admin/alerting/alarm-rules	List all built-in alarm rules
PATCH	/api/admin/alerting/alarm-rules/:id	Toggle enabled or override severity
POST	/api/admin/alerting/alarm-rules/:id/reset	Reset severity to default

Alarms — Rules create alarms when matching events arrive
Events — Events are matched against rules to create alarms
Notification Policies — Policies can be scoped to specific alarm rule names