Metrics¶
The framework ships an OpenTelemetry-based metrics subsystem that connectors can expose over a Prometheus HTTP endpoint. The defaults aim at a single use case: knowing when something breaks, suitable for alerting. Concrete connectors add their own domain metrics on top using the same OpenTelemetry primitives the inorbit-edge SDK uses internally.
What you get out of the box¶
When metrics.enabled = true in your connector configuration, the framework starts a Prometheus HTTP server and exposes:
Metric |
Type |
Attributes |
Meaning |
|---|---|---|---|
|
Gauge |
— |
1 while the connector’s main thread is alive |
|
Gauge |
|
1 when the per-robot MQTT session to InOrbit is connected. Catches the “process running but robot offline” failure mode where MQTT drops and reconnect fails |
|
Counter |
— |
Successful iterations of |
|
Counter |
— |
Exceptions caught in the run loop |
Plus the per-robot publish counters that come from the SDK:
Metric |
Attributes |
Meaning |
|---|---|---|
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
|
|
Calls to |
These signals are usually enough for an MVP alerting setup:
# Process is dead or scrape failing
inorbit_connector_up == 0
# Process is up but its MQTT link to InOrbit is down (robot appears offline)
inorbit_connector_session_connected == 0
# Process is up but not progressing
rate(inorbit_connector_execution_loop_ticks_total[5m]) == 0
# Process is up but erroring
rate(inorbit_connector_execution_loop_errors_total[5m]) > 0
Enabling metrics¶
Add a metrics: block to your connector configuration:
metrics:
enabled: true
bind_host: 127.0.0.1 # bind interface; use 0.0.0.0 for bridge networking
bind_port: 9090 # 0 picks an ephemeral free port
connector_id: my-bot # unique per process on a host
discovery_dir: /var/run/inorbit-metrics # for OTEL collector file_sd
When enabled, the connector also writes a Prometheus file_sd-format JSON file to discovery_dir, naming the bound host:port. A host-side OTEL collector can mount this directory and discover every connector running on the host — see examples/metrics/ for a reference compose stack.
If your scraper is configured statically (e.g. its prometheus.yaml already lists host:port targets, or you only run a single connector behind a known address), set discovery_dir: null to skip writing the discovery file entirely. The HTTP endpoint still serves /metrics as usual.
When enabled is false (the default), no server is started and all instruments become no-ops with zero overhead.
Configuration reference¶
Field |
Default |
Notes |
|---|---|---|
|
|
Master switch. When false, the rest of the block is ignored. |
|
|
Address the HTTP server binds to. |
|
|
TCP port. Use |
|
|
Hostname written to the discovery file. |
|
|
Auto-created on start. Set to |
|
|
Used as |
|
|
Prefix prepended to every Prometheus metric name. ASCII / no hyphens. |
|
|
Added to every metric as OTEL Resource attributes (low-cardinality only). |
Adding metrics to your connector¶
For domain metrics, use the SDK helpers directly. The connector framework imposes no wrapper.
Step 1 — Declare a meter and instruments¶
# my_connector/metrics.py
from inorbit_edge.metrics import get_meter
meter = get_meter("inorbit_my_connector")
api_requests = meter.create_counter(
"my.api.requests", unit="1", description="Calls to the device API",
)
api_errors = meter.create_counter(
"my.api.errors", unit="1", description="Failed calls to the device API",
)
The same module-level pattern the SDK uses for its own counters. get_meter returns a real OTEL Meter when telemetry deps are installed (always the case via inorbit-edge[telemetry]), or a no-op Meter otherwise.
Step 2 — Instrument calls¶
Two patterns; pick whichever fits the call site:
Decorator (counts every call to a method)
from inorbit_edge.metrics import with_counter_metric
class DeviceAPI:
@with_counter_metric(api_requests, attributes={"endpoint": "/status"})
async def get_status(self):
...
with_counter_metric works on sync and async methods. The attributes argument may be a static dict or a callable that returns one. For attributes that come from the bound instance, use attrs_from_self:
from inorbit_edge.metrics import with_counter_metric, attrs_from_self
class DeviceAPI:
def __init__(self, robot_id):
self.robot_id = robot_id
@with_counter_metric(api_requests, attributes=attrs_from_self("robot_id"))
async def get_status(self):
...
Inline (anywhere — error paths, observable state, custom events)
async def get_status(self):
try:
return await self._client.get("/status")
except Exception:
api_errors.add(1, {"endpoint": "/status"})
raise
When to use which scope¶
The single decision that drives metric design is: how many upstream entities does one connector process talk to?
N=1 (single-robot connector, single-PLC connector, etc.):
service.instance.idalready identifies the process. Don’t add arobot_id/device_idattribute on per-call metrics — it would duplicate the Resource attribute that the OTEL collector already attaches.N>1 (
FleetConnectorfor a fleet manager API, gateway controlling many doors, etc.): add the entity id as a per-call attribute. Useattrs_from_self("robot_id")for instance-bound calls; pass it explicitly to.add()/.record()for ad-hoc sites.
For non-robot connectors, name the attribute after the domain entity: device_id, plc_id, door_id, elevator_id. Same pattern, different label name.
Cardinality guardrails¶
OTEL attributes become Prometheus labels. Each unique label-value combination is a separate time series, and series count is the dominant cost driver for both Prometheus and managed services like GCP Cloud Monitoring. Use bounded enums; never put unbounded values in attributes.
Attribute |
Examples (good) |
Examples (bad) |
|---|---|---|
|
|
|
|
|
exception messages |
|
|
full status text |
|
|
|
Forbidden in attributes: full URLs containing IDs, exception messages, query strings, free-form user input.
If you need to mask out an ID-like segment from a value before recording, do it in the connector before the .add() call.
Observable instruments¶
For state derived from connector internals (battery level, broker connected, queue depth), prefer create_observable_gauge with a callback that reads state at scrape time:
from inorbit_edge.metrics import Observation, get_meter
meter = get_meter("inorbit_my_connector")
class DeviceClient:
def __init__(self):
self._connected = False
meter.create_observable_gauge(
"my.broker.connected",
callbacks=[self._connected_cb],
unit="1",
description="1 when the connector is connected to the broker",
)
def _connected_cb(self, _options):
return [Observation(1 if self._connected else 0)]
The callback runs on every scrape, so it should be cheap and side-effect free.
Production deployment¶
For multi-container deployments, see examples/metrics/ for a reference OTEL collector compose stack that:
Discovers all connector containers on a host via Prometheus
file_sd.Exports to GCP Cloud Monitoring (other backends straightforward to swap in).
Works with both bridge and host Docker networking modes.