Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Observability

Graphile Worker RS exposes observability through standard Rust tracing, optional OpenTelemetry compatibility features, and lifecycle hooks. The worker does not force a logging or metrics backend on applications; install the subscriber, exporter, and hook plugin that match your runtime.

Tracing

The main crate depends on tracing, so worker internals and application task handlers can emit spans and events into the subscriber you install. The hooks example uses tracing-subscriber with an environment filter and a formatted output layer:

use tracing_subscriber::{
    filter::EnvFilter, prelude::__tracing_subscriber_SubscriberExt, util::SubscriberInitExt,
};

pub fn enable_logs() {
    let fmt_layer = tracing_subscriber::fmt::layer();
    let filter_layer = EnvFilter::try_new("debug,sqlx=warn").unwrap();

    tracing_subscriber::registry()
        .with(filter_layer)
        .with(fmt_layer)
        .init();
}

Initialize the subscriber once, before starting the worker. The example keeps SQLx noise at warn while enabling debug logging elsewhere.

OpenTelemetry compatibility

OpenTelemetry support is feature-gated. Enable exactly one compatibility feature:

[dependencies]
graphile_worker = {
    version = "0.13",
    features = ["opentelemetry_0_32"]
}

Available compatibility features are:

FeatureOpenTelemetry crate version
opentelemetry_0_30opentelemetry 0.30
opentelemetry_0_31opentelemetry 0.31
opentelemetry_0_32opentelemetry 0.32

The crate rejects builds that enable more than one of these features at the same time.

When an OpenTelemetry feature is enabled, job insertion code can capture the current span context and write it into the job payload under _trace. Object payloads receive the field directly. Array payloads receive it on each object item in the array, while scalar values are left unchanged.

{
  "user_id": 42,
  "_trace": {
    "flags": 1,
    "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
    "span_id": "00f067aa0ba902b7"
  }
}

When the worker later runs a job, it reads _trace from object payloads and adds the recorded span context as a link to the current span. Invalid, non-object, or missing trace payloads are ignored. Without an OpenTelemetry feature, this trace extraction and linking path is a no-op.

Metrics with lifecycle hooks

Lifecycle hooks are the extension point for metrics and structured operational events. A plugin registers callbacks on HookRegistry; those callbacks receive context objects for worker and job events.

The metrics example counts started, completed, and failed jobs with atomics:

use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};

use graphile_worker::{
    HookRegistry, JobComplete, JobFail, JobStart, Plugin, WorkerShutdown, WorkerStart,
};

pub struct MetricsPlugin {
    jobs_started: AtomicU64,
    jobs_completed: AtomicU64,
    jobs_failed: AtomicU64,
}

impl Plugin for MetricsPlugin {
    fn register(self, hooks: &mut HookRegistry) {
        hooks.on(WorkerStart, async |ctx| {
            println!("[MetricsPlugin] Worker {} started", ctx.worker_id);
        });

        let jobs_started = Arc::new(self.jobs_started);
        let jobs_completed = Arc::new(self.jobs_completed);
        let jobs_failed = Arc::new(self.jobs_failed);

        {
            let jobs_started = jobs_started.clone();
            hooks.on(JobStart, move |ctx| {
                let jobs_started = jobs_started.clone();
                async move {
                    jobs_started.fetch_add(1, Ordering::Relaxed);
                    println!(
                        "[MetricsPlugin] Job {} started (task: {})",
                        ctx.job.id(),
                        ctx.job.task_identifier()
                    );
                }
            });
        }

        {
            let jobs_completed = jobs_completed.clone();
            hooks.on(JobComplete, move |ctx| {
                let jobs_completed = jobs_completed.clone();
                async move {
                    jobs_completed.fetch_add(1, Ordering::Relaxed);
                    println!(
                        "[MetricsPlugin] Job {} completed in {:?}",
                        ctx.job.id(),
                        ctx.duration
                    );
                }
            });
        }

        {
            let jobs_failed = jobs_failed.clone();
            hooks.on(JobFail, move |ctx| {
                let jobs_failed = jobs_failed.clone();
                async move {
                    jobs_failed.fetch_add(1, Ordering::Relaxed);
                    println!(
                        "[MetricsPlugin] Job {} failed: {} (will_retry: {})",
                        ctx.job.id(),
                        ctx.error,
                        ctx.will_retry
                    );
                }
            });
        }

        hooks.on(WorkerShutdown, move |ctx| async move {
            println!(
                "[MetricsPlugin] Worker {} shutting down (reason: {:?})",
                ctx.worker_id, ctx.reason
            );
        });
    }
}

Use the same hook points to export counters, histograms, or structured events to your metrics backend:

HookUseful signal
WorkerStartworker process started and has a worker id
WorkerShutdownworker process is stopping and exposes the shutdown reason
JobStartjob id and task identifier started executing
JobCompletejob completed and exposes execution duration
JobFailjob failed, exposes the error and whether it will retry

Logging

For application logs, combine a subscriber with hook callbacks and task-handler instrumentation. The hook contexts expose job ids, task identifiers, errors, durations, worker ids, and shutdown reasons, which are the stable values to put into operational logs.

For example, a JobFail hook can log the job id, error, and retry decision; JobComplete can log duration; WorkerShutdown can log the shutdown reason. The example code prints these values with println!, but production code will usually emit tracing events or send them to a metrics/logging client.

Admin and monitor visibility

The workspace includes graphile_worker_admin_api, graphile_worker_admin_ui, and graphile_worker_admin_ui_client crates. Use those admin surfaces for queue visibility, and use hooks and tracing for process-local signals that admin views cannot infer on their own, such as per-process shutdown reasons, local counters, and span links.

For runtime monitoring, the most useful signals visible from the current public hooks are:

SignalSource
Worker startedWorkerStart hook
Worker shutdown reasonWorkerShutdown hook
Job throughputJobStart and JobComplete hooks
Job failure rateJobFail hook
Job durationJobComplete hook
Trace continuity from enqueue to execution_trace payload plus OpenTelemetry span links