2. How it works

The mental model: a multi-tenant cloud, outbound-only tunnels to your stuff, and an AI assistant with hands.

This chapter is the mental model you need to make sense of every other chapter. We’ll go from a single picture down to the moving parts, then trace what happens when you connect a cloud account or ask the Assistant a question.

You won’t see any code. You will see component names, since you’ll encounter them in the UI and in support conversations.

The 30-second picture

daalu is split into three pieces:

The Daalu cloud is a multi-tenant SaaS. It runs the operator app you log in to (the website), the supporting backend services, and the AI infrastructure.
Your sources are the things you operate. Cloud accounts (AWS, GCP, Azure), Kubernetes clusters (yours), network devices, your observability stack (Prometheus / Grafana / Loki / Thanos), ticketing (PagerDuty), etc. Daalu connects to them in one of two ways:
- Direct integration — for SaaS systems with APIs (Slack, AWS, Nautobot Cloud, Grafana Cloud), Daalu calls their public API using credentials you provide.
- Daalu Edge — for things behind your firewall (an on-prem Kubernetes cluster, a GPU node, a network device with no public endpoint), you deploy a small edge agent inside your network that opens an outbound WireGuard tunnel back to the Daalu cloud. Nothing on your side needs to accept inbound traffic.
Optional local AI — if you want chatbot calls and classifier inference to run on your own hardware, you deploy a small vLLM pod on a GPU node (yours). The Daalu cloud routes inference to that node when it’s online and falls back to commercial LLMs when it’s not.

That’s the whole picture. The rest of this chapter zooms into each piece.

The operator app — what you see in your browser

The operator app at https://ops.daalu.io is the surface you spend your time on. It is a single web application organized into pages along the left sidebar:

Home — the landing page; a daily briefing of what’s changed, what’s degraded, and what wants your attention.
Operations — devices, drift, change proposals; this is the network operator’s home page.
Agents — autonomous helpers that run on schedules or on triggers (e.g. “every weekday at 8 AM, generate a briefing for the platform team”).
Automations — multi-step workflows you’ve authored, with run history.
Alerts — the alert inbox; each alert has a generated explanation, suggested actions, and the ability to escalate.
Reports — AI-generated operational briefings plus a query tab for ad-hoc and saved analyses over events, alerts, devices, and proposals.
Integrations — wire up Slack, Prometheus, cloud accounts, Nautobot, and the rest of the catalog.
AI Factory — your connected GPUs: health, utilization, inference routing, diagnostics, and load tests. Lights up once you’ve onboarded an NVIDIA GPU (Chapter 9).
Workspace — a browser-based, AI-assisted coding environment backed by your own inference once a GPU is online.
Managed infra — the inventory of everything daalu is watching: cloud accounts, Kubernetes clusters, observability stacks.
Usage & Pricing — spend across cloud, GPU, and daalu itself.
Settings — your profile, your team, integrations, API tokens, notification preferences.
Help & Feedback — version info and a way to write to the team that builds the product.

The Assistant is not a separate page — it lives in a panel that follows you across the whole app. From any page you can open it, ask a question grounded in what you’re looking at, and continue.

Part IV of this book is a chapter-per-page tour.

Tenants — your slice of the cloud

Every customer of daalu operates inside a tenant. A tenant is your private workspace: your users, your integrations, your alerts, your spend.

Important properties of tenants:

Tenants are isolated. Users in your tenant never see another tenant’s data, and the API enforces this at every endpoint.
A user belongs to exactly one tenant. Switching tenants means logging in to a different account.
Quotas are per-tenant. Cloud-LLM tokens, alert volume, retention windows — they all meter against your tenant.
Billing is per-tenant. One tenant gets one invoice.

The first user to sign up creates the tenant. They’re the admin. Admins can invite other users, change integrations, and mint API tokens. Everyone else is a regular user, which is enough to do investigation and approve change proposals, but not enough to rotate credentials.

Integrations — how Daalu sees your world

An integration is Daalu’s connection to one of your external systems. There are dozens of them shipped with the product; you’ll find the full list under Managed Infra and a catalog in Part V.

There are three categories of integration, each with a different authentication model:

API-key integrations (Slack, PagerDuty, Nautobot, Grafana Cloud). You generate a token in their system, paste it into Daalu, and Daalu uses it to call their API.
Cloud-account integrations (AWS, GCP, Azure). You create a role/service-account in your cloud and grant Daalu the permissions it needs — usually read-only at first. You can grant write later if you want the Assistant to be able to act.
Cluster integrations (your own Kubernetes clusters). You deploy the Daalu Edge chart into your cluster, which opens an outbound WireGuard tunnel. Once the handshake completes, your cluster appears as connected in Managed Infra. From that point, the Daalu cloud can call read-only and write APIs in your cluster, with every write logged.

Integrations are the substrate that everything else relies on. The Assistant can only see what an integration exposes. Alerts can only fire on data that came in through an integration. The chapters in Part V have step-by-step setup guides for each one.

Events, alerts, and incidents — the operational currency

Every observation Daalu makes is an event. An event is a small JSON object with a source, a kind, a severity, and a payload. A single CPU-pressure metric from Prometheus is an event. A failed build in GitHub Actions is an event. A new device discovered by the network reconciler is an event.

Events flow into Daalu and become one of three things:

Just an event — recorded in the timeline, available for search and queries, but not actively pushed at you.
An alert — when an event matches an alerting rule, it’s promoted to an alert. Alerts have a status (firing/resolved), a severity (info/warning/critical), and a runbook. They show up on the Alerts page and trigger notifications.
An incident — when one or more alerts are grouped (manually or by an agent) into a single ongoing issue, it’s an incident. Incidents have an assignee, a timeline, and a postmortem.

The grouping decision matters because it controls whether your phone rings. A flapping disk-pressure metric should generate hundreds of events but exactly one alert and zero incidents. Chapter 22 covers the rules.

The AI Assistant — the action layer

The Assistant is what makes Daalu different from being “another observability dashboard.” It can do three classes of thing:

Investigate — query Prometheus, read pod logs, list AWS resources, walk a Nautobot device tree, summarize an alert. All read-only.
Propose — write a change proposal that, if approved by a human, mutates state somewhere. “Drain node X and reschedule its pods.” “Update the ACL on switch Y to allow tenant Z.” “Roll back the latest deploy of the api service.”
Notify — page a person, post to Slack, or escalate to PagerDuty.

Investigate is unrestricted. Propose is always staged through a human approval, except when an admin has pre-approved an automation for that tenant. Notify obeys your notification preferences.

The Assistant lives in a panel anchored to the right side of every page. Asking a question on the Alerts page passes the current alert context automatically — you don’t need to copy IDs around. On Operations it picks up the device you’re viewing. This is one of the small things that compounds across a working day.

Cluster federation — getting Daalu close to your stuff

A lot of operators have things behind a firewall. The standard answer in this industry — “open a port, give us a hostname” — is a non-starter for most production networks. Daalu uses a different pattern: outbound-only WireGuard tunnels.

Here is how it works, in plain English:

You go to Managed Infra → Clusters → Add and request an invite. The Daalu cloud generates a one-shot bootstrap token, valid for one hour.
You install the Daalu Edge Helm chart in your cluster with that token. The chart contains a WireGuard agent and a small bootstrap service.
On startup, the edge agent calls back to the Daalu cloud over HTTPS, presents the bootstrap token, and exchanges it for a long-lived WireGuard configuration. The token is then burned.
The agent brings up the WireGuard tunnel. From the Daalu cloud’s side, your cluster is now an additional endpoint on a private network. The cloud can reach your services; nothing outside the tunnel can.
The cluster row in Managed Infra turns connected within ~30 seconds. From there, all the Daalu features that need cluster access (the Assistant’s kube tools, the GPU router, the local-LLM scrape) just work.

This pattern means you never open an inbound port and never give Daalu credentials to a long-lived API key. Revocation is instantaneous — delete the edge chart, the tunnel dies.

Chapter 41 walks through the full edge deploy. Chapter 15 covers the day-to-day operator-facing view of cluster federation.

Your own GPU and the LLM router

Some customers run their own NVIDIA GPUs and would prefer that AI inference happen on their hardware. daalu supports this directly — it’s what the AI Factory is built around. The flow:

You connect a GPU node through Daalu Edge and deploy a small serving stack on it. It hosts an open-weight model (Llama or Qwen) and serves an OpenAI-compatible HTTP API that only daalu can reach through the federation tunnel.
In the operator app, the GPU appears as healthy on the AI Factory page once the router has successfully called it, showing its base URL and the model it’s serving.
From then on, every chatbot turn, every classifier call, and every routine inference task tries your GPU first.

The router uses simple rules: if your GPU tier is online and the model can serve the request, use it; otherwise fall back to the Daalu-hosted tier. The result is that most inference calls (by volume) run on your hardware, costing you nothing per call. Heavy or specialty requests still go to a hosted LLM. Chapters 16 and 17 explain the AI Factory model and the routing decisions in detail.

The quickstart for connecting a GPU is Chapter 9.

Where data lives

On the Standard tier (the default), the rows you create in Daalu — alerts, change proposals, runbook notes, the Assistant’s conversation history — are stored in Daalu’s managed Postgres database, isolated per tenant. Logs and metrics collected through integrations live in Daalu’s observability stack (Prometheus + Loki) with the retention window defined by your plan.

What does not leave your environment, regardless of tier:

Your secrets (API keys, kubeconfigs). They’re stored encrypted at rest and only used in-process when needed.
Your network device configs. Daalu reads them on demand and does not snapshot them by default.
Your cloud-account contents (S3 buckets, BigQuery rows, etc.). Daalu calls the cloud APIs as needed; it doesn’t copy your data.

If your contract requires that even the alerts / proposals / conversation history never live on Daalu’s stack, there is a Daalu Private (full) tier where all of those rows are stored in a Postgres on your own cluster. The Daalu hub becomes a pure control plane: it serves the UI shell, holds your account row, and proxies your browser’s API calls through the WireGuard tunnel to a tenant-scoped backend running inside your cluster. Setup is a single Helm command. Chapter 45 has the details.

The Engineering & Operations manual has the formal tenant-isolation guarantee for anyone who needs to brief a security team.

That’s the model. The rest of the book will keep referring back to these names: tenant, integration, event, alert, change proposal, edge, router, assistant. If you ever feel lost on a page, the side panel’s What is this? link returns to this chapter.

Next: Chapter 3 — Who Daalu is for

What is daalu?Who Daalu is for