rustdesk-server/docs/AGENT-API-AUTH.md

# Agent API authentication

Reference for the per-device signature gate on the agent-facing HTTP
API. Seven endpoints are gated:

- `POST /api/heartbeat`
- `POST /api/sysinfo`
- `POST /api/unattended-password`
- `POST /api/agent/exec-result` — managed-only (no legacy/unsigned path)
- `POST /api/agent/login-event` — user-logon / logoff events observed
  by the agent. Same TOFU lifecycle as heartbeat / sysinfo: stock
  RustDesk doesn't post here at all, so in practice every caller is a
  managed agent; the legacy/unsigned path is kept for symmetry.
- `POST /api/agent/metrics` — continuous CPU / memory / top-process
  samples (≈1 / minute). Surfaced on the admin Devices detail page as
  a 24 h sparkline + live snapshot card.
- `POST /api/agent/perf-events` — sparse Windows-event-log entries
  flagged by `Microsoft-Windows-Diagnostics-Performance/Operational`,
  `Microsoft-Windows-Resource-Exhaustion-Detector/Operational`, and
  hand-picked `System` IDs (41 / 6008 / 1001 — unexpected reboot /
  dirty shutdown / BSOD). Server dedups via UNIQUE (peer_id, provider,
  record_id).

For the operator workflow — turning it on, the dashboard toggle, what
happens when a managed agent is uninstalled — see the matching section
in [CONFIGURATION.md](CONFIGURATION.md).

## Why this exists

All three endpoints originally accepted any caller who supplied an `id`
and `uuid` in the JSON body. Knowing those two values (plaintext on the
device, sent over the rendezvous wire) was enough to inject arbitrary
inventory or heartbeat state for that device — including BIOS serials,
BitLocker recovery keys, the active console user, network interfaces,
connection lists, and the per-boot unattended-access password the
admin UI surfaces to support staff.

The fix reuses the Ed25519 keypair that the agent **already** generates
on first run and registers with the rendezvous server via `RegisterPk`.
Every signed HTTP request is verified against the public key the
rendezvous handshake stored in `peer.pk`, so the trust root is the same
one the relay encryption already depends on. No new credential to
provision, no new secret to leak.

## Trust root

```
First run                Rendezvous (port 21116, TCP/protobuf)
  agent generates sk,pk  ───── RegisterPk(id, pk) ─────►  server stores
  in hello-agent.toml                                      peer.pk

Every subsequent request                HTTP API (port 21114)
  agent signs body                                         server verifies sig
  with sk     ───── POST /api/heartbeat            ─────►  against peer.pk
              ───── POST /api/sysinfo              ─────►  (when peer.managed=1)
              ───── POST /api/unattended-password  ─────►
              ───── POST /api/agent/exec-result    ─────►  (always required)
```

The same secret key signs both the rendezvous identity proof and the
HTTP-API payload — there's only one credential per device.

## Per-peer `managed` flag

The gate is per-device, controlled by the `peer.managed` column
(`INTEGER NOT NULL DEFAULT 0`, added by a soft `ALTER` at startup).

| `managed` | Server behaviour                                                                 |
|-----------|----------------------------------------------------------------------------------|
| `0`       | Legacy path. Signed requests are still verified if present, but absence is OK.   |
| `1`       | Signature required. Any unsigned request claiming this `id` returns 401.         |

How the flag transitions:

- **TOFU promote (0 → 1).** The first request that arrives with a valid
  signature flips `managed` to 1. Hello-agent signs from boot one, so
  the first heartbeat after a hello-agent install transparently locks
  the peer down. No admin action required.
- **Admin promote (0 → 1).** `PUT /api/peers/:id/managed {"managed":true}`
  or the **Require signed API** action in the dashboard's Devices row
  menu. Useful for pre-enrolling a peer record before the agent has
  posted anything.
- **Admin downgrade (1 → 0).** Same endpoint, `{"managed":false}`, or
  **Allow unsigned API** in the dashboard. Use when the managed agent
  has been replaced with stock RustDesk on that device. The dashboard
  toggle requires a confirm because the operation reopens the
  spoofing surface.
- **Never auto-downgraded.** A failed signature on a `managed=1` peer
  is a 401, full stop — there is no "fall back to unsigned" path.
- **Invalid sig on a `managed=0` peer is also 401**, never silently
  treated as legacy. This prevents an attacker from probing for the
  legacy path by deliberately sending a broken signature.

## Wire format

A signed agent request carries two headers in addition to the JSON body:

```
X-RD-Device-Id: <id>
X-RD-Signature: v1.<unix_ts>.<base64(ed25519_sig)>
```

The signed message is the byte concatenation:

```
"rd-api-v1\n" || METHOD || "\n" || PATH || "\n" || TS || "\n" || sha256(BODY)
```

Where:

- `METHOD` is the uppercase HTTP method (`POST`).
- `PATH` is the request path with leading slash and no query string
  (`/api/heartbeat`, `/api/sysinfo`, `/api/unattended-password`).
- `TS` is the same decimal Unix timestamp that appears in the header.
- `sha256(BODY)` is the raw 32-byte SHA-256 of the request body — *not*
  hex-encoded, *not* base64-encoded. It is concatenated as binary.
- The signature is detached Ed25519 over that 32-byte-plus-prefix
  message, base64-encoded with the standard alphabet and no
  URL-safe substitutions.

The `v1.` prefix on the header value reserves a rotation point. The
server rejects any other version string.

### Why this shape

- **Domain separator (`rd-api-v1\n`)** prevents the same `sk` being
  tricked into signing data interpretable as another protocol.
- **Method + path** stop a captured `POST /api/sysinfo` signature from
  being replayed as some future `POST /api/disconnect`.
- **`sha256(body)`** lets us sign without holding the body twice in
  memory on the verify side, and survives any future proxy
  re-chunking.
- **Timestamp in both the header and the signed message** makes the
  skew check trivial without re-parsing the signature value.

## Server-side verification

The extractor [`api::device_auth::verify`](../src/api/device_auth.rs)
runs before each agent handler:

1. **Parse headers.** Both `X-RD-Device-Id` and `X-RD-Signature` must
   be present, or both absent. Mixed states are 401.
2. **Validate the signature envelope.** Version must be `v1`. The
   timestamp must be within ±300 seconds of the server's clock. The
   base64 decode must succeed.
3. **Replay-check.** A keyed-by-`(id, ts, sig-prefix)` LRU cache (size
   16 384, sliding 600-second TTL, sweep-on-insert) rejects exact
   replays inside the window. If the cache is full, we accept and skip
   the cache — DoS-by-cache-exhaustion is uninteresting compared to
   the rest of the surface.
4. **Look up `peer.pk` and `peer.managed`** in one query.
5. **Verify the detached Ed25519 signature** against the canonical
   signed-message bytes (see *Wire format* above).
6. **TOFU promote.** A valid signature on a `managed=0` peer flips the
   flag to 1 in the same request. The promote is best-effort — if the
   DB write fails, the original request is still served, the next
   heartbeat will retry.
7. **Bind the trusted id to the body.** After the handler parses JSON,
   the body's `id` field must match the header's `X-RD-Device-Id`.
   Mismatch is 401 — this is the gate that stops a signed request from
   being repurposed to write to a different peer's row.

If no signature headers are present and the peer is `managed=0`, the
verifier returns `LegacyUnsigned`; the handler then calls
`enforce_managed_for_id(body.id)` after parsing the body, which still
rejects unsigned requests for any *other* peer that has since become
managed.

## Agent-side signing

The signer is one small module: [`vendor/rustdesk/src/hbbs_http/sign.rs`](https://example.invalid/sign.rs)
in the hello-agent vendor tree. It reads the existing
`Config::get_key_pair()` (returns `(sk, pk)` from `hello-agent.toml`)
and the existing `Config::get_id()`, builds the canonical message, and
calls `sodiumoxide::crypto::sign::sign_detached`. Returns the two
header lines joined by `\n`, ready for the multi-header parser in
`common.rs::post_request_`.

The agent always tries to sign. If the keypair hasn't been generated
yet (extremely early boot, before rendezvous has run), the signer
returns `None`, the request goes out unsigned, and:

- If `peer.managed=0`: server accepts it (legacy path).
- If `peer.managed=1`: server returns 401, the agent's next heartbeat
  retries.

This is the only condition under which a hello-agent build sends an
unsigned request, and it self-resolves on the next sync tick.

## Operational gotchas

- **Stock RustDesk clients keep working** because they post unsigned
  and their peer rows stay at `managed=0`. The first time you install
  hello-agent on a device, the existing `peer.pk` row gets reused (the
  agent re-generated a keypair iff `hello-agent.toml` was wiped). The
  first signed heartbeat then promotes the row.
- **`hello-agent --uninstall` preserves the keypair.** A reinstall is
  transparent — signing keeps working.
- **Wiping `hello-agent.toml` between sessions** does mean the next
  boot generates a new keypair. The rendezvous server will treat that
  as a key roll (`register_pk of … due to key not confirmed`) and
  store the new `pk`. The signed HTTP API picks up the new key as soon
  as that rendezvous step completes — usually within a few seconds.
  See [the stale-key recovery note in hello-agent's README](https://example.invalid/README.md)
  for the supporter-side symptoms of a key drift.
- **Clock skew over ±5 minutes** will reject signatures. If your
  fleet shows scattered 401s on heartbeat, check NTP on the affected
  hosts. The server side is the canonical clock.
- **Replay cache survives only inside a single hbbs process.** A
  restart clears it. Combined with the 300-second skew window this
  means a captured signature is replayable across a restart if and
  only if both restarts happen inside that window — an acceptable
  trade-off for keeping the cache in-memory.
- **One server, mixed fleet.** Stock clients and hello-agent clients
  can target the same hbbs without any flag-level config. The gate is
  per-peer.

## Failure modes & log lines

| Symptom                                                                 | Likely cause                                                                                  |
|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| Heartbeats from a known peer suddenly return 401                        | Peer was just promoted (TOFU or admin) and the agent build doesn't sign yet → upgrade agent.  |
| Heartbeats fail intermittently with 401                                 | Clock skew > 5 min, or NAT churn replaying a captured request inside the window.              |
| `peer X TOFU-promoted to managed=1` in hbbs log                         | Normal — first valid signature from a previously-unsigned peer.                               |
| `admin <user> set peer X managed=<bool> via dashboard`                  | Normal — operator used the Devices toggle.                                                    |
| `peer_set_managed(X) failed: …`                                         | DB write failed during TOFU promote. The request was still served; next request will retry.   |
| Admin row shows **Unsigned** for a peer running hello-agent             | Agent hasn't completed its first signed POST yet (keypair race), or it's running a build      |
|                                                                         | that pre-dates the signing patch — check `vendor/rustdesk/src/hbbs_http/sign.rs` is present.  |

## Remote PowerShell exec

Layered on top of the signature gate. An admin in the dashboard sends a
script to a peer; the agent runs it as its service account; output and
exit code come back into the dashboard within ~1 s (the heartbeat
interval).

### Three independent gates

A dispatch must pass **all three** server-side checks before a row is
queued — the agent never sees a script it shouldn't have:

1. **`AuthedUser.is_admin`** — only admins can dispatch.
2. **`peer.managed = 1`** — the same flag the signed-API gate uses.
   This means TOFU has already promoted the peer (or an admin explicitly
   flipped it). Stock RustDesk clients are uninvited.
3. **Strategy `enable-remote-exec = "Y"`** — the resolved strategy for
   the peer must explicitly opt in. Defaults to off. Set it on a strategy,
   assign the strategy to the peer (or its group / owner), exec is now
   live for that scope. *Server-side only — the key is never pushed to
   the client.* See [STRATEGIES.md](STRATEGIES.md).

### Wire path

```
Admin UI ──POST /admin/pages/devices/:id/exec──► Server inserts exec_history(status='queued')
                                                              │
                                                              ▼
                              Agent's next heartbeat reply carries `exec: [{cmd_id, script, max_secs, max_bytes}]`;
                              the server flips the row to 'running' atomically (exec_pop_queued_for_peer).
                                                              │
                                                              ▼
                              Agent runs `powershell.exe -NoProfile -NonInteractive -ExecutionPolicy Bypass -Command -`,
                              writes the script to stdin, captures stdout+stderr up to 1 MiB, kills on 5-minute wall clock.
                                                              │
                                                              ▼
Admin UI ◄──poll /admin/pages/devices/:id/exec/:cmd_id/poll── Server ◄──POST /api/agent/exec-result (signed)── Agent
```

### Limits

| Setting        | Default | Where                                                      |
|----------------|--------:|------------------------------------------------------------|
| Script size    | 32 KiB  | `src/api/admin/pages/exec.rs::MAX_SCRIPT_BYTES`            |
| Wall-clock     |   300 s | `src/api/heartbeat.rs::EXEC_MAX_SECS` (sent to agent)      |
| Output capture |  1 MiB  | `src/api/heartbeat.rs::EXEC_MAX_BYTES` (sent to agent)     |
| In-flight/peer |       1 | `exec_in_flight_count > 0` blocks new dispatch             |

The agent enforces wall-clock and output-capture locally — server caps
are advisory unless you also harden the agent. If you don't trust your
own agent build, the server caps still bound storage and replay-cost.

### Result POST authentication

`POST /api/agent/exec-result` is the only agent endpoint that **always**
requires a signature, even when the peer happens to be `managed=0`.
There's no legacy compatibility story for exec — if the agent can't
sign, the result POST is rejected outright and the row sits in `running`
until an admin notices. Reason: an attacker who can spoof `(id, uuid)`
shouldn't be able to forge "I executed your command and here's the
output" for a device they don't actually control.

### Operational notes

- **The dispatch row stays `running` until the agent posts a result.**
  If the agent crashes mid-script there's no automatic timeout cleanup
  yet (planned: a hourly task that flips long-stuck `running` rows to
  `errored`). Admins can dispatch a fresh command after the in-flight
  one ages past 5 minutes by waiting; the in-flight check is wall-clock
  based on `issued_at`.
- **Output may contain secrets.** A `Get-Content` of a credential file
  goes straight into the `exec_history` table and the admin UI. The
  current schema has no per-row access control beyond "is_admin"; if
  you need finer scoping, audit log retention plus your `users` table
  ACL is the only knob.
- **No interactive REPL yet.** Each dispatch is one shot: write script,
  run, read result. Multi-command sessions or interactive prompts
  (Read-Host, sudo-style passwords) will hang and time out. This is by
  design for v1 — Option B in the original architecture discussion.

## File map

Server:

| Path                                      | Purpose                                                          |
|-------------------------------------------|------------------------------------------------------------------|
| `src/api/device_auth.rs`                  | The verifier (extractor + replay cache + TOFU promote).          |
| `src/api/heartbeat.rs`, `src/api/sysinfo.rs`, `src/api/unattended.rs` | Wired to call `verify` then `enforce_managed_for_id`.    |
| `src/api/agent_exec.rs`                   | `POST /api/agent/exec-result` (sig-required, no legacy path).    |
| `src/api/peers.rs::set_managed`           | `PUT /api/peers/:id/managed` admin endpoint.                     |
| `src/api/admin/pages/devices.rs::toggle_managed` | Dashboard action handler.                                 |
| `src/api/admin/pages/exec.rs`             | Per-device exec page (form + history + HTMX poll fragment).      |
| `src/api/strategy/mod.rs::allows_remote_exec` | Resolves the per-peer strategy and reads `enable-remote-exec`. |
| `src/database.rs::M2_SOFT_ALTERS`         | `ALTER TABLE peer ADD COLUMN managed`.                           |
| `src/database.rs::M5_SCHEMA`              | `CREATE TABLE exec_history` + indexes.                           |
| `src/database.rs::peer_get_auth, peer_set_managed` | DB helpers (untyped `sqlx::query` so they survive the no-DB-migrated dev build). |
| `src/database.rs::exec_create, exec_pop_queued_for_peer, exec_finish, exec_get_by_cmd_id, exec_in_flight_count, exec_list_for_peer` | Exec lifecycle helpers. |

Agent — hello-agent vendor tree:

| Path                                                       | Purpose                                                       |
|------------------------------------------------------------|---------------------------------------------------------------|
| `vendor/rustdesk/src/hbbs_http/sign.rs`                    | The signer.                                                   |
| `vendor/rustdesk/src/hbbs_http/sync.rs` (call sites + `EXEC_SENDER`) | Heartbeat + sysinfo POSTs sign; heartbeat reply forwards queued `exec` requests to the broadcast channel. |
| `vendor/rustdesk/src/common.rs::post_request_, parse_simple_header` | Header parser now accepts `\n`-separated `Name: Value` pairs (backward-compatible). |
| `vendor/rustdesk/src/lib.rs`                               | `pub mod hbbs_http` — required so hello-agent can reach both `::sign` and `::sync::exec_signal_receiver`. |

Agent — hello-agent crate (outside the vendor tree):

| Path                                | Purpose                                                                                 |
|-------------------------------------|-----------------------------------------------------------------------------------------|
| `src/unattended_password.rs::try_report` | Reports the per-boot password to `/api/unattended-password`; signs the POST.       |
| `src/exec.rs`                       | PowerShell runner. Subscribes to the sync layer's broadcast channel, spawns `powershell.exe`, captures stdout/stderr with caps, signs and POSTs the result to `/api/agent/exec-result`. Started from `run_server()` in main.rs. |

## Out of scope

Other agent / management endpoints exist on the same server. They are
deliberately *not* covered by this gate because their trust model is
different:

| Endpoint                       | Why it isn't signature-gated                                                                              |
|--------------------------------|-----------------------------------------------------------------------------------------------------------|
| `POST /api/devices/cli`        | Enrollment via `rustdesk --assign --token <T> …`. Already authenticated by a user/admin bearer session; the operator's job is to *supply* an arbitrary `(id, uuid)` for binding. Requiring the device's `sk` would defeat the use case. |
| `GET  /api/sysinfo_ver`        | Returns a single public version string. No body, no DB write — no spoof surface to gate.                  |
| `POST /api/record`             | Session-recording upload. Disabled by default in the OSS uploader; managed builds use it under a separate auth model. Out of scope for the current sweep. |
| `POST /api/login`, `/api/login-options`, `/api/currentUser`, `/api/logout` | User session management — separate auth model (password + TOTP / OIDC). |
| Everything under `/api/ab/*`, `/api/audit/*`, `/api/peers*`, `/api/2fa/*`, `/api/oidc/*`, `/admin/*` | Already gated by `AuthedUser` (cookie or bearer). |