52. Glossary

Every term in one place, cross-referenced where it helps.

adapter The piece of Daalu that talks to one specific external system. Each integration uses one adapter. See Chapter 11.

admin A user with admin rights on the tenant — can manage users, integrations, billing, and tenant settings. See Chapter 10.

agent A configured autonomous behaviour that runs on a schedule or trigger: the briefing agent, the reconciler, the alert-explainer, the cost-anomaly agent. See Chapter 20.

AI Factory Daalu’s view of your connected GPUs — health, utilisation, inference routing, diagnostics, and load tests. Lights up once you onboard an NVIDIA card, and is the home of the your own GPU inference tier. See Chapter 30.

AIPerf NVIDIA’s load-test and SLO benchmark, built into the AI Factory to measure your GPU’s throughput and latency under realistic concurrency before you depend on it. See Chapter 34.

alert A signal worth a human’s attention, created when an event matches an alert rule. See Chapter 12 and Chapter 22.

alert rule The recipe that promotes events into alerts — defined in your observability stack or in Daalu’s Settings → Alert rules.

Assistant The AI co-pilot. A chat panel that investigates, proposes, and notifies, grounded in whatever page you’re on. See Chapter 13.

audit log Tenant-wide log of every state-changing action. Admin-only. See Chapter 28.

bootstrap token The one-shot credential used to set up cluster federation. Expires after 1 hour. See Chapter 41.

briefing The auto-generated summary on the home page, produced daily by the briefing agent. See Chapter 18.

change proposal A structured request to mutate state, awaiting human approval before the executor runs it. The central safety primitive in Daalu — the Assistant and agents can only change your infrastructure through one. See Chapter 19.

classifier (intent) A short, cheap inference intent for routing and scoring decisions. The high-volume workhorse; the traffic that benefits most from running on your own GPU. Contrast quality. See Chapter 47.

cluster federation The pattern by which Daalu reaches your on-prem / private Kubernetes clusters through an outbound WireGuard tunnel. See Chapter 15.

Coding Workspace A browser-based, AI-assisted coding environment backed by your own GPU’s inference. Unlocks once a card is online; pauses after 24 h idle. See Chapter 35.

cost-anomaly agent The shipped agent that flags significant spend deviations in real time. See Chapter 20.

daalu chat The command-line entry point to the Daalu Coding Agent — start an agentic coding session from your terminal, authenticated with a personal access token. See Chapter 35.

Daalu Coding Agent The tool-using coding agent that powers the Coding Workspace and daalu chat — it reads and edits files and runs commands, with its inference served by your own GPU rather than an external provider. See Chapter 35.

Daalu Edge The Helm chart you install in your cluster to open the outbound WireGuard tunnel that enables federation (and reaches your GPU). See Chapter 41.

Daalu-hosted tier An inference tier served by a Daalu-operated shared GPU — cheaper than commercial models, and the fallback when you haven’t connected your own card. See Chapter 44.

Daalu Private The tier for strict data-residency needs: your tenant’s data and agents run inside your own cluster, with the Daalu hub reduced to a control plane. See Chapter 45.

DCGM NVIDIA Data Center GPU Manager — the source of the GPU health and utilisation metrics the AI Factory streams (temperature, power, memory, XID errors, ECC counts). See Chapter 32.

drift A difference between your Source of Truth and the live state of a device, detected by the reconciler. See Chapter 14.

ECC Error-correcting-code memory events on a GPU. A rising ECC count is an early hardware-health warning the AI Factory surfaces. See Chapter 33.

event The lowest-level observation. Every adapter emits events; some become alerts. See Chapter 12.

executor The internal Daalu service with credentials to mutate external state. Acts only on approved change proposals. See Chapter 19.

federation tunnel The WireGuard connection between your cluster and the Daalu hub. See Chapter 15.

four-eyes rule The constraint that a user cannot approve a change proposal they (or the Assistant on their behalf) proposed. See Chapter 10.

GPU class A label capturing a card model and, crucially, its VRAM — which decides what it can serve. The two common classes are ada-16 (RTX 2000 Ada, 16 GB) and ada-48 (RTX 6000 Ada, 48 GB). See Chapter 31.

GPU consumer A tenant whose inference is served by a shared GPU it doesn’t own. One of the three GPU-sharing roles. See Chapter 16.

GPU owner A tenant whose connected card is dedicated to its own inference — nobody else’s traffic runs on it. The default when you connect a GPU for your own use. See Chapter 16.

GPU provider A tenant that contributes its card to a shared pool other tenants can draw on, earning credit for the calls it serves. See Chapter 16.

hub The Daalu-side server that terminates federation tunnels and serves the control plane. Operated by us; you don’t see it directly.

incident A grouping of alerts under one ongoing investigation, with a timeline and a postmortem. See Chapter 22.

inference gateway The component that fronts your GPU’s OpenAI-compatible endpoint over the federation tunnel, so the router can reach it while nothing else can. See Chapter 17.

ingest API key A separate credential class used by the event-ingest and webhook endpoints. Different from a personal access token. See Chapter 40.

integration Daalu’s connection to one of your external systems — the unit of configuration on the Integrations and Managed Infra pages. See Chapter 11.

invite The single-use link sent to a teammate to join your tenant. Expires in 7 days. See Chapter 6.

JWT The short-lived token issued at login for browser sessions. Different from a personal access token. See Chapter 53.

label A key=value tag on an alert, event, or integration, used for routing and filtering. See Chapter 12.

LLM router The component that decides where each inference call goes — your own GPU, the Daalu-hosted tier, or a commercial model — based on your SKU’s routing policy and what’s healthy. See Chapter 17 and Chapter 42.

Nautobot The currently-supported Source of Truth system. See Chapter 38.

notification The outbound message Daalu sends to get a human’s attention — via Slack, PagerDuty, email, or browser push. See Chapter 37.

NVSentinel The GPU reliability watchdog that watches hardware-health signals (XID, ECC, thermals) and reacts to a wedged or failing card. See Chapter 33.

PAT — see personal access token.

personal access token (PAT) The durable credential for scripts, CI, and the CLI. Format: dpat_xxxxxxxxxxxxxxxxxxxxxxxx. See Chapter 43.

plan — see SKU.

proposal — see change proposal.

quality (intent) An inference intent for real reasoning — answering an operator, explaining an alert, reviewing a diff. Lower volume, higher value than classifier; the calls most likely to fall back to a frontier model. See Chapter 47.

reconciler The shipped agent that compares your Source of Truth with live device state and writes drift proposals. See Chapter 14.

role A user’s level within a tenant — admin or regular user. See Chapter 10.

routing policy The rule a SKU pins that tells the router where to send calls: Local-First, Hybrid, External-Only, or your own GPU. See Chapter 47.

runbook Documentation attached to an alert rule describing what to do when it fires. See Chapter 12.

seat A unique user active in your tenant within the billing month, used for plan accounting. See Chapter 46.

severity One of info, warning, critical. See Chapter 12.

SKU A purchasable plan. It pins a routing policy, a monthly base, an included-call bundle, and the per-tier inference rates you’re charged. See Chapter 47.

snooze Suppressing notifications for an alert without resolving it. See Chapter 22.

Source of Truth (SoT) Your authoritative inventory system — currently Nautobot. Daalu reconciles live device state against it and flags drift. See Chapter 14.

tenant Your company’s isolated workspace inside Daalu — your users, integrations, alerts, and spend. See Chapter 10.

tier (inference) Which endpoint actually served a call: your own GPU, Daalu-hosted, or a commercial (external) model. Billing is always against the tier that did the work. See Chapter 47.

tool A specific function the Assistant can call — some read-only, some routed through a change proposal. See Chapter 13.

tunnel — see federation tunnel.

user Anyone with a Daalu account, admin or not. See Chapter 6.

vLLM The serving engine that runs the open-weight model on your GPU, exposing an OpenAI-compatible API the router reaches over the tunnel. It pre-allocates VRAM for its KV cache, which is why a healthy idle card reads high memory at near-zero compute. See Chapter 31.

Watchdog The always-firing test alert that verifies the alert pipeline end-to-end. See Chapter 22.

webhook HTTP-based integration for arbitrary external systems, inbound or outbound. See Chapter 40.

workflow / automation A user-authored, multi-step automation with run history. Becomes an agent once active. See Chapter 21.

XID An NVIDIA GPU error code reported in the driver and surfaced by the AI Factory. A recurring XID is a strong signal of a hardware or driver fault. See Chapter 33.

your own GPU tier The inference tier served by a card you connected through Daalu Edge. The router tries it first; calls it serves cost zero per call because you own the hardware. See Chapter 31.

That’s the handbook.

For the engineering and operations companion, see the Engineering & Operations manual in the sibling docs/book-engineer/ directory.

Next: Back to the table of contents

Frequently asked questions API reference