Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Operations

This section covers the surfaces you use after Graphile Worker RS is running in an application: schema migrations, the command-line tool, the admin UI, worker shutdown, recovery, and production readiness checks.

Use these pages when you are preparing a deployment or responding to a queue incident:

  • CLI for adding, listing, completing, failing, rescheduling, removing, cleaning up, unlocking, and inspecting jobs.
  • Admin UI for browser-based queue inspection and controlled maintenance actions.
  • Migrations for installing and updating the PostgreSQL schema.
  • Deployment for running workers outside local development.
  • Observability for production visibility.

Operational surfaces

Graphile Worker RS exposes three main operational entry points.

Worker runtime

The worker process owns job execution. Production configuration normally starts with a PostgreSQL pool, a schema name, registered task handlers, concurrency, and shutdown behavior:

use graphile_worker::{WorkerOptions, WorkerShutdownConfig};
use std::time::Duration;

let shutdown = WorkerShutdownConfig::default()
    .listen_os_shutdown_signals(true)
    .grace_period(Duration::from_secs(5))
    .interrupted_job_retry_delay(Duration::from_secs(30));

let worker = WorkerOptions::default()
    .concurrency(5)
    .schema("graphile_worker")
    .worker_shutdown(shutdown)
    .define_job::<SendEmail>()
    .pg_pool(pg_pool)
    .init()
    .await?;

worker.run().await?;

If your host application already owns shutdown, disable the built-in OS signal listeners and pass the host shutdown future to the worker. The worker still handles graceful draining, local queue release, batcher flushes, and configured grace-period behavior for in-flight jobs.

Command-line tool

The installed binary is graphile-worker. It connects through --database-url or DATABASE_URL and uses the graphile_worker schema by default.

graphile-worker --database-url postgres://postgres:postgres@localhost/postgres migrate
DATABASE_URL=postgres://postgres:postgres@localhost/postgres graphile-worker add send_email --payload '{"to":"user@example.com"}'
graphile-worker list --state ready
graphile-worker show 123
graphile-worker complete 123 124
graphile-worker fail 125 --reason "invalid payload"
graphile-worker reschedule 126 --run-at 2026-01-02T03:04:05Z
graphile-worker remove cli-job-key
graphile-worker cleanup
graphile-worker force-unlock graphile_worker_deadbeef
graphile-worker stats
graphile-worker queues
graphile-worker workers

For stale worker recovery, dry-run first so you can see which workers would be recovered:

graphile-worker sweep-stale-workers --dry-run
graphile-worker sweep-stale-workers --sweep-threshold 5m --recovery-delay 30s

Admin UI

The native admin UI is an Axum server. It serves the browser UI and protected API routes for:

  • session metadata
  • queue overview, stats, queues, locked workers, and active workers
  • job listing and single-job inspection
  • adding raw jobs
  • completing, permanently failing, running now, and rescheduling jobs
  • removing a job by key
  • maintenance actions: migrate, cleanup, force unlock, and sweep stale workers

When embedding the admin UI directly, AdminServerConfig defaults to 127.0.0.1:4000, uses the graphile_worker schema, enables write actions, and uses Basic auth with a random password for the admin user. The graphile-worker admin CLI command overrides the listen default to 127.0.0.1:5678. Authentication modes available in the server configuration are Basic, Bearer token, custom header token, and no auth. No-auth mode is rejected on non-loopback listen addresses.

Run write-capable admin UI instances behind an access-controlled boundary. The server adds no-store cache headers for dynamic routes, caches static assets for one hour, sets X-Content-Type-Options: nosniff, denies framing, uses Referrer-Policy: no-referrer, and applies a Content Security Policy. API write methods also require the admin CSRF token.

Use read-only mode for shared inspection surfaces. In read-only mode, write routes return 403 instead of mutating jobs or running maintenance.

Migrations

Migrations install and update the Graphile Worker PostgreSQL schema. The migration runner:

  • creates the schema and migrations table if they are missing
  • checks that PostgreSQL is version 12 or newer
  • runs pending migrations in order inside transactions
  • records each migration id and whether it is breaking
  • rejects a database revision that includes a breaking migration newer than the current crate supports
  • warns, but continues, when the database has a newer non-breaking revision than the current crate supports

Run migrations before starting workers after deploys that may include schema changes:

graphile-worker migrate

Migration 11 has a specific safety check: if locked jobs are present and PostgreSQL returns error code 22012, migration fails with guidance to shut workers down cleanly and unlock locked jobs and queues before retrying.

Recovery and locked jobs

Worker recovery is opt-in. When enabled, workers register heartbeat rows in PostgreSQL and refresh last_heartbeat_at on a configured interval. A sweeper can recover jobs held by workers that stopped heartbeating, and jobs or queues locked by worker ids that are no longer registered.

use graphile_worker::{WorkerOptions, WorkerRecoveryConfig};
use std::time::Duration;

let recovery = WorkerRecoveryConfig::default()
    .enabled(true)
    .heartbeat_interval(Duration::from_secs(30))
    .sweep_interval(Duration::from_secs(60))
    .sweep_threshold(Duration::from_secs(300))
    .recovery_delay(Duration::from_secs(30));

let worker = WorkerOptions::default()
    .worker_recovery(recovery)
    .define_job::<SendEmail>()
    .pg_pool(pg_pool)
    .init()
    .await?;

Recovered jobs are unlocked, their attempt count is decremented back, their queue lock is released, and run_at is delayed by the configured recovery delay. Sweeps use a transaction-scoped PostgreSQL advisory lock so only one sweeper performs recovery at a time.

Manual recovery is available even when automatic recovery is disabled:

use graphile_worker::SweepStaleWorkersOptions;
use std::time::Duration;

let result = worker
    .create_utils()
    .sweep_stale_workers(SweepStaleWorkersOptions {
        sweep_threshold: Some(Duration::from_secs(300)),
        recovery_delay: Some(Duration::from_secs(30)),
        dry_run: true,
    })
    .await?;

println!("Would recover workers: {:?}", result.worker_ids);

Use force-unlock only when you have identified worker ids that should no longer own locks:

graphile-worker workers
graphile-worker force-unlock graphile_worker_deadbeef

Local queue considerations

The local queue reduces database round trips by batch-fetching jobs and caching them in the worker process. It has operational implications:

  • unclaimed jobs are returned to PostgreSQL on shutdown or when the local queue TTL expires
  • refetch delay can reduce thundering-herd behavior when queues are empty
  • cache size and refill threshold should be chosen with worker concurrency and database load in mind
use graphile_worker::{LocalQueueConfig, RefetchDelayConfig, WorkerOptions};
use std::time::Duration;

let worker = WorkerOptions::default()
    .local_queue(
        LocalQueueConfig::default()
            .with_size(100)
            .with_ttl(Duration::from_secs(300))
            .with_refetch_delay(
                RefetchDelayConfig::default()
                    .with_duration(Duration::from_millis(100))
                    .with_threshold(10),
            ),
    )
    .define_job::<SendEmail>()
    .pg_pool(pg_pool)
    .init()
    .await?;

Production readiness checklist

Before sending production traffic to workers, verify the following:

  • PostgreSQL is version 12 or newer.
  • The database URL is configured through DATABASE_URL or an equivalent application secret.
  • Migrations are run before workers start processing jobs.
  • All deployed workers agree on the Graphile Worker schema name.
  • Worker task handlers are registered for every task identifier you enqueue.
  • Concurrency, PostgreSQL pool size, and local queue size are sized together.
  • Shutdown behavior is wired into your process manager or host application.
  • Recovery is enabled when the deployment model can leave jobs locked after crashes, network partitions, forced aborts, or orchestrator shutdowns.
  • Admin UI access uses Basic, Bearer, or header auth when exposed beyond loopback.
  • Admin UI read-only mode is used for inspection-only deployments.
  • Maintenance commands such as cleanup, force-unlock, and sweep-stale-workers are restricted to operators who can safely mutate queue state.
  • Stale-worker sweeps are dry-run before recovery during incident response.
  • Cleanup policy is understood before permanently failed jobs, task identifiers, or job queues are garbage-collected.
  • Observability covers worker startup, job failures, locked workers, queue backlog, and migration failures.