Monitoring
What to watch in production.
Metrics
The SDK does not expose Prometheus / OpenTelemetry metrics directly in v0.1 — it's all tracing for now. Wrap the Axum router with your own metrics middleware:
use axum_prometheus::PrometheusMetricLayer;
let (metrics_layer, metric_handle) = PrometheusMetricLayer::pair();
let app = Router::new()
.nest("/v1/auth", routes::auth::router())
.layer(metrics_layer)
.route("/metrics", get(|| async move { metric_handle.render() }));
This gives you per-endpoint latency, request count, error rate — the basics.
Key metrics to alert on
| Signal | Why it matters |
|---|---|
POST /v1/auth/login p95 > 500ms | Argon2id is CPU-heavy; spikes suggest Argon2 contention. |
POST /v1/auth/login 401 rate > 10% sustained | Credential-stuffing campaign. |
POST /v1/auth/mfa/challenge 429 rate | Brute-force attempt against a specific user. |
POST /v1/auth/password/forgot rate vs baseline | Phishing-style reset spam. |
POST /v1/auth/refresh 401 rate > 1% | Sessions expiring before refresh; check access_expiry_secs vs your client retry policy. |
| 5xx rate on any endpoint > 0.1% | Database / cache / email transport failure. |
Logs
The SDK emits structured logs via tracing. Recommended filter for
production:
RUST_LOG=info,identsphere=debug,sqlx=warn
Key log targets:
identsphere_axum::routes::*— per-handler debug logs.identsphere_axum::middleware::auth_middleware— JWT validation + session-cache hits.identsphere_core::services::audit— audit-pipeline diagnostics.IdentSphere::invite— invite-email send failures.
Audit-table monitoring
Query audit_logs for suspicious patterns:
-- Failed-login bursts per IP in the last hour
SELECT ip_address, COUNT(*) AS attempts
FROM IdentSphere.audit_logs
WHERE action = 'auth.login.failed'
AND created_at > now() - interval '1 hour'
GROUP BY ip_address
ORDER BY attempts DESC
LIMIT 20;
-- Successful logins from a never-before-seen IP for each user
SELECT actor_id, ip_address, MIN(created_at) AS first_seen
FROM IdentSphere.audit_logs
WHERE action = 'auth.login' AND status = 'success'
GROUP BY actor_id, ip_address
HAVING MIN(created_at) > now() - interval '1 day';
-- MFA disables
SELECT *
FROM IdentSphere.audit_logs
WHERE action = 'auth.mfa.disabled'
AND created_at > now() - interval '1 day';
Send these to a SIEM (Splunk, ELK, Datadog) for real-time alerting.
Health endpoints
The SDK doesn't ship a /health endpoint by default; add your own that
checks SELECT 1 against the DB and pings the session cache:
async fn health(State(state): State<AppState>) -> StatusCode {
let db_ok = state.db.ping().await.is_ok();
let cache_ok = state.session_cache.get("health").await.is_ok();
if db_ok && cache_ok { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE }
}
Wire it into your Kubernetes / load-balancer health checks.
Tracing
OpenTelemetry support via tracing-opentelemetry:
use opentelemetry::trace::TracerProvider;
use tracing_subscriber::layer::SubscriberExt;
let tracer = opentelemetry_otlp::new_pipeline()
.tracing()
.with_exporter(opentelemetry_otlp::new_exporter().tonic())
.install_batch(opentelemetry_sdk::runtime::Tokio)?;
tracing_subscriber::registry()
.with(tracing_subscriber::fmt::layer())
.with(tracing_opentelemetry::layer().with_tracer(tracer))
.init();
Now every Axum request produces a span with route + status + latency.
Backup verification
A backup you haven't restored is a backup that doesn't exist. Quarterly:
- Spin up a fresh Postgres instance.
- Restore the most recent backup.
- Run
identsphere migrate status— it should report "up to date." - Run a SELECT against
IdentSphere.usersto confirm the row count matches production.