metrics: Add a process_start_time_seconds gauge

The new metric is there to help gauge iocaine's uptime, without relying
on an external collector like `systemd_exporter`.

Signed-off-by: Gergely Nagy <me@gergo.csillger.hu>
This commit is contained in:
Gergely Nagy 2025-02-06 21:50:53 +01:00
parent 7497c955fc
commit 4b91d25a88
No known key found for this signature in database
2 changed files with 22 additions and 3 deletions

View file

@ -3,7 +3,10 @@ title: "Monitoring iocaine"
description: How to monitor iocaine with Prometheus and Grafana? description: How to monitor iocaine with Prometheus and Grafana?
--- ---
`iocaine` can be [configured](@/configuration/index.md#metrics) to expose [Prometheus](https://prometheus.io)-compatible metrics, separately from the garbage generator. When enabled, a single metric - `iocaine_requests_total` - is exposed, with various labels attached, if so configured. It is a simple counter, showing the number of hits `iocaine` served. `iocaine` can be [configured](@/configuration/index.md#metrics) to expose [Prometheus](https://prometheus.io)-compatible metrics, separately from the garbage generator. When enabled, two metrics are exposed:
- `iocaine_requests_total`, a counter of how many hits `iocaine` served, optionally with labels attached (see below).
- `process_start_time_seconds`, a gauge, a timestamp of when `iocaine` started, to allow measuring uptime.
# The simplest configuration # The simplest configuration
@ -14,13 +17,18 @@ Lets start with a simple configuration: no labels, just the metric.
enable = true enable = true
``` ```
This will expose the following metric on `http://127.0.0.1:42042/metrics`: This will expose the following metrics on `http://127.0.0.1:42042/metrics`:
``` ```
# TYPE iocaine_requests_total counter # TYPE iocaine_requests_total counter
iocaine_requests_total 1 iocaine_requests_total 1
# TYPE process_start_time_seconds gauge
process_start_time_seconds{instance="127.0.0.1:42042",service="iocaine"} 1738873005.2406795
``` ```
The `process_start_time_seconds` metric is *always* present, and its value only changes when `iocaine` is restarted. For the sake of brevity, it is excluded from all other examples.
# Per-host metrics # Per-host metrics
While an unlabeled metric is nice to have, it's a little bit bland. We can add a `host` label, to be able to group request totals by host - where the host is whatever is in the `Host` header when it reaches `iocaine`. While an unlabeled metric is nice to have, it's a little bit bland. We can add a `host` label, to be able to group request totals by host - where the host is whatever is in the `Host` header when it reaches `iocaine`.

View file

@ -19,6 +19,8 @@ use axum::{
#[cfg(feature = "prometheus")] #[cfg(feature = "prometheus")]
use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle}; use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle};
use std::sync::Arc; use std::sync::Arc;
#[cfg(feature = "prometheus")]
use std::time::{SystemTime, UNIX_EPOCH};
#[cfg(feature = "prometheus")] #[cfg(feature = "prometheus")]
use crate::config::MetricsLabel; use crate::config::MetricsLabel;
@ -160,9 +162,18 @@ impl Iocaine {
#[cfg(feature = "prometheus")] #[cfg(feature = "prometheus")]
async fn start_metrics_server(metrics_bind: String) -> std::result::Result<(), std::io::Error> { async fn start_metrics_server(metrics_bind: String) -> std::result::Result<(), std::io::Error> {
let metrics_listener = tokio::net::TcpListener::bind(metrics_bind).await?; let metrics_listener = tokio::net::TcpListener::bind(metrics_bind.clone()).await?;
let app = Self::metrics_app().await; let app = Self::metrics_app().await;
let ts = SystemTime::now()
.duration_since(UNIX_EPOCH)
.expect("Time went backwards");
let labels = [
("instance", metrics_bind),
("service", "iocaine".to_string()),
];
metrics::gauge!("process_start_time_seconds", &labels).set(ts);
axum::serve(metrics_listener, app) axum::serve(metrics_listener, app)
.with_graceful_shutdown(shutdown_signal()) .with_graceful_shutdown(shutdown_signal())
.await .await