35. The coding workspace
A browser-hosted VS Code with an agent that runs on your own GPU — so your code is never sent to an external model provider.
The rest of Daalu helps you operate infrastructure. The coding workspace helps you change it — edit the Terraform, Helm, Python, and runbooks that define how your systems run, with an AI agent that runs on your perimeter and never on the public internet.
It looks like a hosted dev environment, with three differences that matter: the editor runs in a pod that’s yours alone, the AI is constrained to your own GPU or the Daalu-hosted pool with no cloud fallback, and the agent has tools to read, search, write, and run — gated on your approval.
At a glance
What it is Browser VS Code (code-server), one per user, files on a persistent volume, pre-wired to your private inference tier. Where to find it Workspace in the left sidebar ( /workspace).Who can use it Tenants with an in-perimeter inference tier — your own GPU, the Daalu-hosted pool, or an operator override.
Eligibility
The workspace unlocks when your tenant has an inference tier that can serve coding prompts inside the perimeter — any one of:
- A GPU you’ve connected (Chapter 31) — coding prompts route to it.
- Enrolment in the Daalu-hosted inference tier — prompts route to the operator’s GPU pool.
- An operator override for your tenant (common during a rollout).
If none applies, the Workspace page tells you so and links you to Usage & Pricing to add an AI tier. The sidebar entry is always visible; clicking it shows your status either way.
Creating a workspace
One workspace per user. The create form has four decisions.
Coding model
Pick the model the agent inside your workspace will run on. Which entries are selectable depends on which GPU is actually present for your tenant — a 48 GB model can’t be served on a 16 GB card, so it shows disabled until that hardware is online.
| Model | Fits | Context | Best for |
|---|---|---|---|
| Qwen2.5-Coder 14B (AWQ-INT4) | ada-16 (16 GB) | ~8K | The everyday default. Strong open coding model with native tool-calling; fits a single RTX 2000 Ada. |
| Qwen2.5-Coder 32B (dense, AWQ) | ada-48 (48 GB) | 32K | Best single-model code quality at this size. Needs an RTX 6000 Ada. |
| Qwen3-Coder 30B-A3B (MoE, FP8) | ada-48 (48 GB) | Long (launched at 65K) | Very fast (3B active params), long-context, purpose-built for agentic tool use. The agent model. |
All three support native tool-calling, which is what lets the Daalu Coding Agent drive its tools cleanly rather than parsing intentions out of free text. The catalog is the single source of truth — the model you pick is the model your prompts hit, on your hardware.
Profile
This is compute for the editor — CPU, RAM, and disk — separate from the model:
| Profile | Disk | Use when |
|---|---|---|
| Small (default) | 5 GiB | Most work. Start here. |
| Medium | 20 GiB | Larger repos or heavier builds. |
| Large | 50 GiB | Big monorepos, lots of dependencies. |
Pick small unless you know you need more. Your operator may cap the maximum profile your tenant can choose.
Git repo (optional)
Paste an HTTPS repo URL and a branch, and the workspace is seeded with
a fresh clone on first boot — GitHub, GitLab, or self-hosted. Leave it
blank to start empty; you can drag files into the editor or git clone from the terminal later.
Access token (private repos, optional)
If the repo is private, add a personal access token — GitHub repo
scope or GitLab read_repository scope. See the callout below for
exactly how it’s handled.
Click Create workspace. Provisioning takes ~60 seconds the first
time while the image pulls; the page polls and flips to Active.
Then Open IDE launches VS Code in a new tab, loaded into your
/workspace folder.
Why it matters: your private-repo token is treated as a secret, not a setting. The access token is write-only in the UI — you can set it, but it’s never returned; the page later shows only a “Private” badge to indicate one is stored. It’s length-capped, encrypted at rest (Fernet), and decrypted just-in-time into an in-memory git credential helper for the clone. It is never written into your repo’s git config and never lands on disk in the clear. Combined with locked-down pod egress, that means the credential exists only for as long as the clone needs it.
The Daalu Coding Agent
Inside the IDE, a Daalu icon in the activity bar opens the Daalu Coding Agent — a chat agent powered by your workspace’s own GPU model via native OpenAI function-calling. (Code-server’s built-in Copilot Chat is suppressed in favour of it, so there’s one agent, on your hardware.)
The agent runs a tool loop. Some tools are read-only and run automatically; the ones that change things require your approval:
| Tool | Type | Approval |
|---|---|---|
read_file | Read | Auto — runs as part of the loop. |
list_dir | Read | Auto. |
search_files | Read | Auto. |
repo_map | Read | Auto — builds a structural map of the repo. |
write_file | Mutating | Requires your approval before the file changes. |
run_command | Mutating | Requires your approval before it runs. |
So the agent can freely explore your code — open files, search, map the repo — to ground its answer, but it cannot modify a file or run a command without you saying yes. You stay in the loop on every side-effect, and you see what it’s about to do before it does it.
The daalu chat REPL
Prefer the terminal? Open a terminal in the IDE and run daalu chat
for a Claude-Code-style REPL with the same tools and the same
approval model — one confirmation per side-effect. It hits the same
inference tier as the activity-bar agent, so the same privacy
guarantee applies. Use whichever surface fits your workflow; many
people keep the panel open for chat and drop to daalu chat for
focused, multi-step edits.
The terminal also ships with the usual tooling — git, kubectl,
helm, jq — so you can clone, build, and push without leaving the
browser.
The privacy guarantee
Why it matters: coding prompts never leave for an external provider. The LLM router has a hard rule for coding work — it blocks every external tier. There is no Anthropic / OpenAI / commercial fallback for code. The only destinations the router will consider are your own GPU tier or the Daalu-hosted pool. If neither can serve the request, the agent returns an error rather than quietly sending your source to a third party. Your code stays inside your perimeter by construction, not by policy you have to trust.
This is the whole point of running the agent on your own GPU: the model is good enough to be genuinely useful, and your proprietary code never becomes someone else’s training data or transits a network you don’t control. The routing model is Chapter 16; the router itself is Chapter 17.
Isolation
- One pod per user. Your workspace is its own Kubernetes pod with its own filesystem. Another user in your tenant cannot read your files or attach to your shell.
- Locked-down egress. The pod’s network is restricted — it can reach what it needs to clone and to talk to the inference tier, not arbitrary internal services.
- Files on a persistent volume. Your
/workspacefolder lives on a persistent volume sized by your profile and survives pauses.
Lifecycle
A workspace is meant to be long-lived but not immortal:
- Active — at least one open IDE session recently. As long as you have the IDE open in a tab, the idle clock doesn’t start.
- Paused (after 24h idle) — the pod scales down; your files on the volume are preserved. Re-opening the IDE brings it back quickly.
- Destroyed (after 14 days of continued inactivity) — the
volume is released and your files are gone. You get an email
warning before the cliff, so push any in-flight work with a normal
git pushfirst.
Deleting a workspace yourself releases its volume immediately — push anything you care about before you do.
That completes Part V. The coding workspace is where your own GPU stops being an observability dashboard and becomes a tool you use every day. For how that inference is routed and billed, see the concepts in Chapter 17 and the platform detail in Chapter 44.