Stuck Running
A scanner started a job but never reported completion or further progress. Wall-clock elapsed time exceeds 3× the estimated duration.
How to
Investigate
- Check the scanner host — process may have hung, crashed, or lost network
- Review scanner logs for the job ID
- Check ingest logs for unsuccessful result POSTs (could indicate result_submit_failed)
Repair
- Fail the job — terminal. The scan rolls up failures into its own status.
Gotchas
- Stuck running differs from no_heartbeat: the scanner may still be alive but the specific job is wedged.
- Failing the job does not stop the actual scanner work; if the scanner later completes, results may be silently dropped.
- Opt-in automation: with PROGRESS_TIMEOUT_AUTO_FAIL=true the dispatcher auto-fails jobs in this state using the same staleness predicate as this detector, and this alarm then auto-resolves. Default off — repair stays operator-driven.
API calls (1)
| Method | Path | Description |
|---|---|---|
| POST | /api/health/job-errors/:stateId/repair | action=fail |
Related
- Job Errors Overview — Category model