mipo

Stuck Running

A scanner started a job but never reported completion or further progress. Wall-clock elapsed time exceeds 3× the estimated duration.

How to

Investigate

  1. Check the scanner host — process may have hung, crashed, or lost network
  2. Review scanner logs for the job ID
  3. Check ingest logs for unsuccessful result POSTs (could indicate result_submit_failed)

Repair

  1. Fail the job — terminal. The scan rolls up failures into its own status.

Gotchas

  • Stuck running differs from no_heartbeat: the scanner may still be alive but the specific job is wedged.
  • Failing the job does not stop the actual scanner work; if the scanner later completes, results may be silently dropped.
  • Opt-in automation: with PROGRESS_TIMEOUT_AUTO_FAIL=true the dispatcher auto-fails jobs in this state using the same staleness predicate as this detector, and this alarm then auto-resolves. Default off — repair stays operator-driven.

API calls (1)

MethodPathDescription
POST /api/health/job-errors/:stateId/repair action=fail

Related