Metrics¶

The framework ships an OpenTelemetry-based metrics subsystem that connectors can expose over a Prometheus HTTP endpoint. The defaults aim at a single use case: knowing when something breaks, suitable for alerting. Concrete connectors add their own domain metrics on top using the same OpenTelemetry primitives the inorbit-edge SDK uses internally.

What you get out of the box¶

When metrics.enabled = true in your connector configuration, the framework starts a Prometheus HTTP server and exposes:

Metric	Type	Attributes	Meaning
`inorbit_connector_up`	Gauge	—	1 while the connector’s main thread is alive
`inorbit_connector_session_connected`	Gauge	`robot_id`	1 when the per-robot MQTT session to InOrbit is connected. Catches the “process running but robot offline” failure mode where MQTT drops and reconnect fails
`inorbit_connector_execution_loop_ticks_total`	Counter	—	Successful iterations of `_execution_loop`
`inorbit_connector_execution_loop_errors_total`	Counter	—	Exceptions caught in the run loop

Plus the per-robot publish counters that come from the SDK:

Metric	Attributes	Meaning
`calls_publish_pose_total`	`robot_id`	Calls to `publish_pose`
`calls_publish_odometry_total`	`robot_id`	Calls to `publish_odometry`
`calls_publish_key_values_total`	`robot_id`	Calls to `publish_key_values`
`calls_publish_system_stats_total`	`robot_id`	Calls to `publish_system_stats`
`calls_publish_map_total`	`robot_id`	Calls to `publish_map`
`calls_publish_camera_frame_total`	`robot_id`	Calls to `publish_camera_frame`
`calls_publish_lasers_total`	`robot_id`	Calls to `publish_lasers` / `publish_laser`
`calls_publish_path_total`	`robot_id`	Calls to `publish_path`

These signals are usually enough for an MVP alerting setup:

# Process is dead or scrape failing
inorbit_connector_up == 0

# Process is up but its MQTT link to InOrbit is down (robot appears offline)
inorbit_connector_session_connected == 0

# Process is up but not progressing
rate(inorbit_connector_execution_loop_ticks_total[5m]) == 0

# Process is up but erroring
rate(inorbit_connector_execution_loop_errors_total[5m]) > 0

Enabling metrics¶

Add a metrics: block to your connector configuration:

metrics:
  enabled: true
  bind_host: 127.0.0.1   # bind interface; use 0.0.0.0 for bridge networking
  bind_port: 9090        # 0 picks an ephemeral free port
  connector_id: my-bot   # unique per process on a host
  discovery_dir: /var/run/inorbit-metrics  # for OTEL collector file_sd

When enabled, the connector also writes a Prometheus file_sd-format JSON file to discovery_dir, naming the bound host:port. A host-side OTEL collector can mount this directory and discover every connector running on the host — see examples/metrics/ for a reference compose stack.

If your scraper is configured statically (e.g. its prometheus.yaml already lists host:port targets, or you only run a single connector behind a known address), set discovery_dir: null to skip writing the discovery file entirely. The HTTP endpoint still serves /metrics as usual.

When enabled is false (the default), no server is started and all instruments become no-ops with zero overhead.

Configuration reference¶

Field	Default	Notes
`enabled`	`false`	Master switch. When false, the rest of the block is ignored.
`bind_host`	`0.0.0.0`	Address the HTTP server binds to.
`bind_port`	`9090`	TCP port. Use `0` to let the OS pick.
`advertise_host`	`socket.gethostname()`	Hostname written to the discovery file.
`discovery_dir`	`/var/run/inorbit-metrics`	Auto-created on start. Set to `null` to skip writing a discovery file.
`connector_id`	`socket.gethostname()`	Used as `service.instance.id` and as the discovery filename.
`exporter_namespace`	`"inorbit_connector"`	Prefix prepended to every Prometheus metric name. ASCII / no hyphens.
`extra_resource_attributes`	`{}`	Added to every metric as OTEL Resource attributes (low-cardinality only).

Adding metrics to your connector¶

For domain metrics, use the SDK helpers directly. The connector framework imposes no wrapper.

Step 1 — Declare a meter and instruments¶

# my_connector/metrics.py
from inorbit_edge.metrics import get_meter

meter = get_meter("inorbit_my_connector")

api_requests = meter.create_counter(
    "my.api.requests", unit="1", description="Calls to the device API",
)
api_errors = meter.create_counter(
    "my.api.errors", unit="1", description="Failed calls to the device API",
)

The same module-level pattern the SDK uses for its own counters. get_meter returns a real OTEL Meter when telemetry deps are installed (always the case via inorbit-edge[telemetry]), or a no-op Meter otherwise.

Step 2 — Instrument calls¶

Two patterns; pick whichever fits the call site:

Decorator (counts every call to a method)

from inorbit_edge.metrics import with_counter_metric

class DeviceAPI:
    @with_counter_metric(api_requests, attributes={"endpoint": "/status"})
    async def get_status(self):
        ...

with_counter_metric works on sync and async methods. The attributes argument may be a static dict or a callable that returns one. For attributes that come from the bound instance, use attrs_from_self:

from inorbit_edge.metrics import with_counter_metric, attrs_from_self

class DeviceAPI:
    def __init__(self, robot_id):
        self.robot_id = robot_id

    @with_counter_metric(api_requests, attributes=attrs_from_self("robot_id"))
    async def get_status(self):
        ...

Inline (anywhere — error paths, observable state, custom events)

async def get_status(self):
    try:
        return await self._client.get("/status")
    except Exception:
        api_errors.add(1, {"endpoint": "/status"})
        raise

When to use which scope¶

The single decision that drives metric design is: how many upstream entities does one connector process talk to?

N=1 (single-robot connector, single-PLC connector, etc.): service.instance.id already identifies the process. Don’t add a robot_id / device_id attribute on per-call metrics — it would duplicate the Resource attribute that the OTEL collector already attaches.
N>1 (FleetConnector for a fleet manager API, gateway controlling many doors, etc.): add the entity id as a per-call attribute. Use attrs_from_self("robot_id") for instance-bound calls; pass it explicitly to .add() / .record() for ad-hoc sites.

For non-robot connectors, name the attribute after the domain entity: device_id, plc_id, door_id, elevator_id. Same pattern, different label name.

Cardinality guardrails¶

OTEL attributes become Prometheus labels. Each unique label-value combination is a separate time series, and series count is the dominant cost driver for both Prometheus and managed services like GCP Cloud Monitoring. Use bounded enums; never put unbounded values in attributes.

Attribute	Examples (good)	Examples (bad)
`endpoint`	`/status`, `/missions`	`/missions/<uuid>`
`result`	`success`, `error`	exception messages
`status`	`200`, `404`, `500` (or `2xx`/`4xx`/`5xx`)	full status text
`topic_pattern`	`robot/cmd/velocity`	`robot/<id>/cmd/velocity`

Forbidden in attributes: full URLs containing IDs, exception messages, query strings, free-form user input.

If you need to mask out an ID-like segment from a value before recording, do it in the connector before the .add() call.

Observable instruments¶

For state derived from connector internals (battery level, broker connected, queue depth), prefer create_observable_gauge with a callback that reads state at scrape time:

from inorbit_edge.metrics import Observation, get_meter

meter = get_meter("inorbit_my_connector")

class DeviceClient:
    def __init__(self):
        self._connected = False

        meter.create_observable_gauge(
            "my.broker.connected",
            callbacks=[self._connected_cb],
            unit="1",
            description="1 when the connector is connected to the broker",
        )

    def _connected_cb(self, _options):
        return [Observation(1 if self._connected else 0)]

The callback runs on every scrape, so it should be cheap and side-effect free.

Production deployment¶

For multi-container deployments, see examples/metrics/ for a reference OTEL collector compose stack that:

Discovers all connector containers on a host via Prometheus file_sd.
Exports to GCP Cloud Monitoring (other backends straightforward to swap in).
Works with both bridge and host Docker networking modes.