Maintenance

The Maintenance page centralizes all container management into a single Admin page. Restart any service, infrastructure component, or database — with appropriate danger warnings for high-risk operations. Also surfaces backup triggers, S3 connectivity tests, and recent maintenance activity.

Fields & Columns

Name Description
Container Name of the Docker container
Tier Classification: Services, Infrastructure, or Databases
Status Current health status from the watchdog/health endpoints
Danger Whether restarting this container carries risk (data loss, connection drops)

How To

Restart a service

  1. Find the container in the Services section
  2. Click the "Restart" button next to it
  3. Type the container name to confirm
  4. Wait for the restart to complete (cooldown: 5 minutes between restarts)

Restart a database (danger zone)

  1. Find the database in the Databases section (marked with a danger warning)
  2. Read the warning about potential data corruption
  3. Click the "Restart" button
  4. Type the container name to confirm
  5. Monitor the health status to verify the database comes back healthy

Trigger a manual backup

  1. Scroll to the Backup Operations section
  2. Click "Trigger Manual Backup"
  3. The backup container polls every 60 seconds, so it is not instantaneous
  4. Check the recent activity log for the result

Test S3 connectivity

  1. Scroll to the Backup Operations section
  2. Click "Test S3 Connection"
  3. The result will show whether the configured S3 bucket is reachable

Gotchas

  1. Restarting the manager will disconnect your current session briefly — the page will reload automatically.
  2. Database restarts risk data corruption if writes are in flight. Only restart databases when you are certain no active scans or writes are occurring.
  3. Restarting Traefik drops ALL active connections to the system, including your own.
  4. The watchdog container cannot restart itself — it is listed as non-restartable.
  5. A 5-minute cooldown applies per container to prevent restart storms.
  6. Backup triggers are not instantaneous — the backup container polls every 60 seconds.
  7. The yellow row warning and the drawer waiting count can disagree — that is correct. The row warning requires sustained contention (3 or more consecutive 1-second samples with waiting > 0); the drawer Connection Pool tiles show only the most recent sample. A non-zero waiting count in the drawer with no yellow row warning means a transient spike that did not escalate to an alarm. Use the "Peak waiting in last 60s" line in the drawer to spot bursts that have already resolved. If the drawer shows no Connection Pool section, the sampler has not produced its first sample yet (manager just started).

API Calls (7)

Method Path Description
GET /api/admin/services/restartable List all restartable containers with tier and danger classification
GET /api/admin/services/maintenance-context Enriched container list with health diagnostics, open alarm counts, last restart history, and watchdog reachability — loaded once on page open to minimize round trips
GET /api/admin/services/:name/context Per-container detail loaded lazily when the detail drawer opens: health diagnostic, uptime, memory usage, DB pool stats (for database containers), recent audit events, and open alarms
POST /api/admin/services/:name/restart Restart a specific container via the watchdog sidecar
POST /api/admin/backups/trigger Trigger an immediate backup
POST /api/admin/backups/test-s3 Test S3 connection with current settings
GET /api/audit/changes Fetch recent audit events (filtered for maintenance activity)

Related Pages

  • Health Overview — Architecture diagram links to this page via wrench icons on each node
  • Backups — Full backup configuration — this page only surfaces trigger and S3 test shortcuts
  • Backup Health — Read-only backup health monitoring