Operations
This section covers the surfaces you use after Graphile Worker RS is running in an application: schema migrations, the command-line tool, the admin UI, worker shutdown, recovery, and production readiness checks.
Use these pages when you are preparing a deployment or responding to a queue incident:
- CLI for adding, listing, completing, failing, rescheduling, removing, cleaning up, unlocking, and inspecting jobs.
- Admin UI for browser-based queue inspection and controlled maintenance actions.
- Migrations for installing and updating the PostgreSQL schema.
- Deployment for running workers outside local development.
- Observability for production visibility.
Operational surfaces
Graphile Worker RS exposes three main operational entry points.
Worker runtime
The worker process owns job execution. Production configuration normally starts with a PostgreSQL pool, a schema name, registered task handlers, concurrency, and shutdown behavior:
use graphile_worker::{WorkerOptions, WorkerShutdownConfig};
use std::time::Duration;
let shutdown = WorkerShutdownConfig::default()
.listen_os_shutdown_signals(true)
.grace_period(Duration::from_secs(5))
.interrupted_job_retry_delay(Duration::from_secs(30));
let worker = WorkerOptions::default()
.concurrency(5)
.schema("graphile_worker")
.worker_shutdown(shutdown)
.define_job::<SendEmail>()
.pg_pool(pg_pool)
.init()
.await?;
worker.run().await?;
If your host application already owns shutdown, disable the built-in OS signal listeners and pass the host shutdown future to the worker. The worker still handles graceful draining, local queue release, batcher flushes, and configured grace-period behavior for in-flight jobs.
Command-line tool
The installed binary is graphile-worker. It connects through
--database-url or DATABASE_URL and uses the graphile_worker schema by
default.
graphile-worker --database-url postgres://postgres:postgres@localhost/postgres migrate
DATABASE_URL=postgres://postgres:postgres@localhost/postgres graphile-worker add send_email --payload '{"to":"user@example.com"}'
graphile-worker list --state ready
graphile-worker show 123
graphile-worker complete 123 124
graphile-worker fail 125 --reason "invalid payload"
graphile-worker reschedule 126 --run-at 2026-01-02T03:04:05Z
graphile-worker remove cli-job-key
graphile-worker cleanup
graphile-worker force-unlock graphile_worker_deadbeef
graphile-worker stats
graphile-worker queues
graphile-worker workers
For stale worker recovery, dry-run first so you can see which workers would be recovered:
graphile-worker sweep-stale-workers --dry-run
graphile-worker sweep-stale-workers --sweep-threshold 5m --recovery-delay 30s
Admin UI
The native admin UI is an Axum server. It serves the browser UI and protected API routes for:
- session metadata
- queue overview, stats, queues, locked workers, and active workers
- job listing and single-job inspection
- adding raw jobs
- completing, permanently failing, running now, and rescheduling jobs
- removing a job by key
- maintenance actions: migrate, cleanup, force unlock, and sweep stale workers
When embedding the admin UI directly, AdminServerConfig defaults to
127.0.0.1:4000, uses the graphile_worker schema, enables write actions, and
uses Basic auth with a random password for the admin user. The
graphile-worker admin CLI command overrides the listen default to
127.0.0.1:5678. Authentication modes available in the server configuration are
Basic, Bearer token, custom header token, and no auth. No-auth mode is rejected
on non-loopback listen addresses.
Run write-capable admin UI instances behind an access-controlled boundary. The
server adds no-store cache headers for dynamic routes, caches static assets for
one hour, sets X-Content-Type-Options: nosniff, denies framing, uses
Referrer-Policy: no-referrer, and applies a Content Security Policy. API
write methods also require the admin CSRF token.
Use read-only mode for shared inspection surfaces. In read-only mode, write
routes return 403 instead of mutating jobs or running maintenance.
Migrations
Migrations install and update the Graphile Worker PostgreSQL schema. The migration runner:
- creates the schema and
migrationstable if they are missing - checks that PostgreSQL is version 12 or newer
- runs pending migrations in order inside transactions
- records each migration id and whether it is breaking
- rejects a database revision that includes a breaking migration newer than the current crate supports
- warns, but continues, when the database has a newer non-breaking revision than the current crate supports
Run migrations before starting workers after deploys that may include schema changes:
graphile-worker migrate
Migration 11 has a specific safety check: if locked jobs are present and
PostgreSQL returns error code 22012, migration fails with guidance to shut
workers down cleanly and unlock locked jobs and queues before retrying.
Recovery and locked jobs
Worker recovery is opt-in. When enabled, workers register heartbeat rows in
PostgreSQL and refresh last_heartbeat_at on a configured interval. A sweeper
can recover jobs held by workers that stopped heartbeating, and jobs or queues
locked by worker ids that are no longer registered.
use graphile_worker::{WorkerOptions, WorkerRecoveryConfig};
use std::time::Duration;
let recovery = WorkerRecoveryConfig::default()
.enabled(true)
.heartbeat_interval(Duration::from_secs(30))
.sweep_interval(Duration::from_secs(60))
.sweep_threshold(Duration::from_secs(300))
.recovery_delay(Duration::from_secs(30));
let worker = WorkerOptions::default()
.worker_recovery(recovery)
.define_job::<SendEmail>()
.pg_pool(pg_pool)
.init()
.await?;
Recovered jobs are unlocked, their attempt count is decremented back, their
queue lock is released, and run_at is delayed by the configured recovery
delay. Sweeps use a transaction-scoped PostgreSQL advisory lock so only one
sweeper performs recovery at a time.
Manual recovery is available even when automatic recovery is disabled:
use graphile_worker::SweepStaleWorkersOptions;
use std::time::Duration;
let result = worker
.create_utils()
.sweep_stale_workers(SweepStaleWorkersOptions {
sweep_threshold: Some(Duration::from_secs(300)),
recovery_delay: Some(Duration::from_secs(30)),
dry_run: true,
})
.await?;
println!("Would recover workers: {:?}", result.worker_ids);
Use force-unlock only when you have identified worker ids that should no
longer own locks:
graphile-worker workers
graphile-worker force-unlock graphile_worker_deadbeef
Local queue considerations
The local queue reduces database round trips by batch-fetching jobs and caching them in the worker process. It has operational implications:
- unclaimed jobs are returned to PostgreSQL on shutdown or when the local queue TTL expires
- refetch delay can reduce thundering-herd behavior when queues are empty
- cache size and refill threshold should be chosen with worker concurrency and database load in mind
use graphile_worker::{LocalQueueConfig, RefetchDelayConfig, WorkerOptions};
use std::time::Duration;
let worker = WorkerOptions::default()
.local_queue(
LocalQueueConfig::default()
.with_size(100)
.with_ttl(Duration::from_secs(300))
.with_refetch_delay(
RefetchDelayConfig::default()
.with_duration(Duration::from_millis(100))
.with_threshold(10),
),
)
.define_job::<SendEmail>()
.pg_pool(pg_pool)
.init()
.await?;
Production readiness checklist
Before sending production traffic to workers, verify the following:
- PostgreSQL is version 12 or newer.
- The database URL is configured through
DATABASE_URLor an equivalent application secret. - Migrations are run before workers start processing jobs.
- All deployed workers agree on the Graphile Worker schema name.
- Worker task handlers are registered for every task identifier you enqueue.
- Concurrency, PostgreSQL pool size, and local queue size are sized together.
- Shutdown behavior is wired into your process manager or host application.
- Recovery is enabled when the deployment model can leave jobs locked after crashes, network partitions, forced aborts, or orchestrator shutdowns.
- Admin UI access uses Basic, Bearer, or header auth when exposed beyond loopback.
- Admin UI read-only mode is used for inspection-only deployments.
- Maintenance commands such as
cleanup,force-unlock, andsweep-stale-workersare restricted to operators who can safely mutate queue state. - Stale-worker sweeps are dry-run before recovery during incident response.
- Cleanup policy is understood before permanently failed jobs, task identifiers, or job queues are garbage-collected.
- Observability covers worker startup, job failures, locked workers, queue backlog, and migration failures.