Monitoring

What to watch in production.

Metrics

The SDK does not expose Prometheus / OpenTelemetry metrics directly in v0.1 — it's all tracing for now. Wrap the Axum router with your own metrics middleware:

use axum_prometheus::PrometheusMetricLayer;

let (metrics_layer, metric_handle) = PrometheusMetricLayer::pair();

let app = Router::new()
    .nest("/v1/auth", routes::auth::router())
    .layer(metrics_layer)
    .route("/metrics", get(|| async move { metric_handle.render() }));

This gives you per-endpoint latency, request count, error rate — the basics.

Key metrics to alert on

Signal	Why it matters
`POST /v1/auth/login` p95 > 500ms	Argon2id is CPU-heavy; spikes suggest Argon2 contention.
`POST /v1/auth/login` 401 rate > 10% sustained	Credential-stuffing campaign.
`POST /v1/auth/mfa/challenge` 429 rate	Brute-force attempt against a specific user.
`POST /v1/auth/password/forgot` rate vs baseline	Phishing-style reset spam.
`POST /v1/auth/refresh` 401 rate > 1%	Sessions expiring before refresh; check `access_expiry_secs` vs your client retry policy.
5xx rate on any endpoint > 0.1%	Database / cache / email transport failure.

Logs

The SDK emits structured logs via tracing. Recommended filter for production:

RUST_LOG=info,identsphere=debug,sqlx=warn

Key log targets:

identsphere_axum::routes::* — per-handler debug logs.
identsphere_axum::middleware::auth_middleware — JWT validation + session-cache hits.
identsphere_core::services::audit — audit-pipeline diagnostics.
IdentSphere::invite — invite-email send failures.

Audit-table monitoring

Query audit_logs for suspicious patterns:

-- Failed-login bursts per IP in the last hour
SELECT ip_address, COUNT(*) AS attempts
  FROM IdentSphere.audit_logs
 WHERE action = 'auth.login.failed'
   AND created_at > now() - interval '1 hour'
 GROUP BY ip_address
 ORDER BY attempts DESC
 LIMIT 20;

-- Successful logins from a never-before-seen IP for each user
SELECT actor_id, ip_address, MIN(created_at) AS first_seen
  FROM IdentSphere.audit_logs
 WHERE action = 'auth.login' AND status = 'success'
 GROUP BY actor_id, ip_address
HAVING MIN(created_at) > now() - interval '1 day';

-- MFA disables
SELECT *
  FROM IdentSphere.audit_logs
 WHERE action = 'auth.mfa.disabled'
   AND created_at > now() - interval '1 day';

Send these to a SIEM (Splunk, ELK, Datadog) for real-time alerting.

Health endpoints

The SDK doesn't ship a /health endpoint by default; add your own that checks SELECT 1 against the DB and pings the session cache:

async fn health(State(state): State<AppState>) -> StatusCode {
    let db_ok = state.db.ping().await.is_ok();
    let cache_ok = state.session_cache.get("health").await.is_ok();
    if db_ok && cache_ok { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE }
}

Wire it into your Kubernetes / load-balancer health checks.

Tracing

OpenTelemetry support via tracing-opentelemetry:

use opentelemetry::trace::TracerProvider;
use tracing_subscriber::layer::SubscriberExt;

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(opentelemetry_otlp::new_exporter().tonic())
    .install_batch(opentelemetry_sdk::runtime::Tokio)?;

tracing_subscriber::registry()
    .with(tracing_subscriber::fmt::layer())
    .with(tracing_opentelemetry::layer().with_tracer(tracer))
    .init();

Now every Axum request produces a span with route + status + latency.

Backup verification

A backup you haven't restored is a backup that doesn't exist. Quarterly:

Spin up a fresh Postgres instance.
Restore the most recent backup.
Run identsphere migrate status — it should report "up to date."
Run a SELECT against IdentSphere.users to confirm the row count matches production.

Metrics​

Key metrics to alert on​

Logs​

Audit-table monitoring​

Health endpoints​

Tracing​

Backup verification​