Skip to main content

Metrics

Each OSAPI component exposes a Prometheus-compatible /metrics endpoint on a dedicated port. Metrics are collected using OpenTelemetry with an isolated Prometheus registry per component, so each endpoint shows only that component's data.

Endpoints

Each component's metrics server exposes three endpoints:

EndpointDescription
/metricsPrometheus metrics in text exposition format
/healthLiveness probe — always returns {"status":"ok"}
/health/readyReadiness probe — 200 when ready, 503 when not

The health probes are unauthenticated and always available when the metrics server is enabled.

ComponentDefault PortConfig Key
Controller9090controller.metrics.port
Agent9091agent.metrics.port
NATS9092nats.server.metrics.port

Application Metrics Reference

All application metrics are instrumented through OpenTelemetry and exported via the OTEL Prometheus exporter with the osapi namespace. The osapi_ prefix is applied automatically. OTEL scope labels (otel_scope_name, etc.) are included in the output for traceability.

Go runtime (goroutines, memory, GC) and process metrics (CPU, memory, file descriptors) are included on every component.

MetricTypeLabelsComponentDescription
osapi_component_upgaugeall1 when ready, 0 when not
osapi_subsystem_upgaugesubsystemall1 when subsystem is ok, 0 when not
osapi_jobs_created_totalcountercontrollerJobs submitted via API
osapi_jobs_processed_totalcounterstatusagentJobs completed or failed
osapi_jobs_activegaugeagentCurrently executing jobs
osapi_job_duration_secondshistogramagentJob execution duration
osapi_heartbeat_age_secondsgaugeagentSeconds since last heartbeat write

The osapi_subsystem_up gauge has a subsystem label identifying which internal service it represents (e.g., api, heartbeat, metrics, notifier, tracing, facts). A value of 1 means the subsystem status is ok; 0 means disabled or error.

The controller also exposes HTTP request metrics from the OTEL middleware using standard http.server.* names (http.server.request.duration, http.server.active_requests, etc.).

The NATS component exposes osapi_component_up and osapi_subsystem_up only — NATS has its own native monitoring.

Health Probes

Each metrics server also serves lightweight health probes on the same port. These are always unauthenticated.

Liveness (/health)

Always returns 200 OK with {"status":"ok"} when the metrics server is running. Use this to detect hung or crashed processes.

Readiness (/health/ready)

Returns 200 OK when the component is ready, or 503 Service Unavailable when it is not. Use this to gate traffic until the component has fully started.

Kubernetes Example

livenessProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 5
periodSeconds: 10

readinessProbe:
httpGet:
path: /health/ready
port: 9090
initialDelaySeconds: 5
periodSeconds: 10

Adjust the port to match the component (9090 for controller, 9091 for agent, 9092 for NATS).

Integration

Point your Prometheus instance at each component:

# prometheus.yml
scrape_configs:
- job_name: 'osapi-controller'
static_configs:
- targets: ['localhost:9090']
- job_name: 'osapi-agent'
static_configs:
- targets: ['localhost:9091']
- job_name: 'osapi-nats'
static_configs:
- targets: ['localhost:9092']

Configuration

Each component's metrics server can be enabled or disabled independently:

controller:
metrics:
enabled: true
port: 9090

agent:
metrics:
enabled: true
port: 9091

nats:
server:
metrics:
enabled: true
port: 9092

Set enabled: false to disable the metrics endpoint for a component. See the Configuration reference for the full list of settings and environment variable overrides.