Implement signed API communication to improve security

2026-05-22 12:50:42 +02:00
parent 21b25bcc1b
commit 475da0e950
12 changed files with 906 additions and 24 deletions
@@ -0,0 +1,254 @@
+# Agent API authentication
+
+Reference for the per-device signature gate on the agent-facing HTTP
+API. Three endpoints are gated:
+
+- `POST /api/heartbeat`
+- `POST /api/sysinfo`
+- `POST /api/unattended-password`
+
+For the operator workflow — turning it on, the dashboard toggle, what
+happens when a managed agent is uninstalled — see the matching section
+in [CONFIGURATION.md](CONFIGURATION.md).
+
+## Why this exists
+
+All three endpoints originally accepted any caller who supplied an `id`
+and `uuid` in the JSON body. Knowing those two values (plaintext on the
+device, sent over the rendezvous wire) was enough to inject arbitrary
+inventory or heartbeat state for that device — including BIOS serials,
+BitLocker recovery keys, the active console user, network interfaces,
+connection lists, and the per-boot unattended-access password the
+admin UI surfaces to support staff.
+
+The fix reuses the Ed25519 keypair that the agent **already** generates
+on first run and registers with the rendezvous server via `RegisterPk`.
+Every signed HTTP request is verified against the public key the
+rendezvous handshake stored in `peer.pk`, so the trust root is the same
+one the relay encryption already depends on. No new credential to
+provision, no new secret to leak.
+
+## Trust root
+
+```
+First run                Rendezvous (port 21116, TCP/protobuf)
+  agent generates sk,pk  ───── RegisterPk(id, pk) ─────►  server stores
+  in hello-agent.toml                                      peer.pk
+
+Every subsequent request                HTTP API (port 21114)
+  agent signs body                                         server verifies sig
+  with sk     ───── POST /api/heartbeat            ─────►  against peer.pk
+              ───── POST /api/sysinfo              ─────►  (when peer.managed=1)
+              ───── POST /api/unattended-password  ─────►
+```
+
+The same secret key signs both the rendezvous identity proof and the
+HTTP-API payload — there's only one credential per device.
+
+## Per-peer `managed` flag
+
+The gate is per-device, controlled by the `peer.managed` column
+(`INTEGER NOT NULL DEFAULT 0`, added by a soft `ALTER` at startup).
+
+| `managed` | Server behaviour                                                                 |
+|-----------|----------------------------------------------------------------------------------|
+| `0`       | Legacy path. Signed requests are still verified if present, but absence is OK.   |
+| `1`       | Signature required. Any unsigned request claiming this `id` returns 401.         |
+
+How the flag transitions:
+
+- **TOFU promote (0 → 1).** The first request that arrives with a valid
+  signature flips `managed` to 1. Hello-agent signs from boot one, so
+  the first heartbeat after a hello-agent install transparently locks
+  the peer down. No admin action required.
+- **Admin promote (0 → 1).** `PUT /api/peers/:id/managed {"managed":true}`
+  or the **Require signed API** action in the dashboard's Devices row
+  menu. Useful for pre-enrolling a peer record before the agent has
+  posted anything.
+- **Admin downgrade (1 → 0).** Same endpoint, `{"managed":false}`, or
+  **Allow unsigned API** in the dashboard. Use when the managed agent
+  has been replaced with stock RustDesk on that device. The dashboard
+  toggle requires a confirm because the operation reopens the
+  spoofing surface.
+- **Never auto-downgraded.** A failed signature on a `managed=1` peer
+  is a 401, full stop — there is no "fall back to unsigned" path.
+- **Invalid sig on a `managed=0` peer is also 401**, never silently
+  treated as legacy. This prevents an attacker from probing for the
+  legacy path by deliberately sending a broken signature.
+
+## Wire format
+
+A signed agent request carries two headers in addition to the JSON body:
+
+```
+X-RD-Device-Id: <id>
+X-RD-Signature: v1.<unix_ts>.<base64(ed25519_sig)>
+```
+
+The signed message is the byte concatenation:
+
+```
+"rd-api-v1\n" || METHOD || "\n" || PATH || "\n" || TS || "\n" || sha256(BODY)
+```
+
+Where:
+
+- `METHOD` is the uppercase HTTP method (`POST`).
+- `PATH` is the request path with leading slash and no query string
+  (`/api/heartbeat`, `/api/sysinfo`, `/api/unattended-password`).
+- `TS` is the same decimal Unix timestamp that appears in the header.
+- `sha256(BODY)` is the raw 32-byte SHA-256 of the request body — *not*
+  hex-encoded, *not* base64-encoded. It is concatenated as binary.
+- The signature is detached Ed25519 over that 32-byte-plus-prefix
+  message, base64-encoded with the standard alphabet and no
+  URL-safe substitutions.
+
+The `v1.` prefix on the header value reserves a rotation point. The
+server rejects any other version string.
+
+### Why this shape
+
+- **Domain separator (`rd-api-v1\n`)** prevents the same `sk` being
+  tricked into signing data interpretable as another protocol.
+- **Method + path** stop a captured `POST /api/sysinfo` signature from
+  being replayed as some future `POST /api/disconnect`.
+- **`sha256(body)`** lets us sign without holding the body twice in
+  memory on the verify side, and survives any future proxy
+  re-chunking.
+- **Timestamp in both the header and the signed message** makes the
+  skew check trivial without re-parsing the signature value.
+
+## Server-side verification
+
+The extractor [`api::device_auth::verify`](../src/api/device_auth.rs)
+runs before each agent handler:
+
+1. **Parse headers.** Both `X-RD-Device-Id` and `X-RD-Signature` must
+   be present, or both absent. Mixed states are 401.
+2. **Validate the signature envelope.** Version must be `v1`. The
+   timestamp must be within ±300 seconds of the server's clock. The
+   base64 decode must succeed.
+3. **Replay-check.** A keyed-by-`(id, ts, sig-prefix)` LRU cache (size
+   16 384, sliding 600-second TTL, sweep-on-insert) rejects exact
+   replays inside the window. If the cache is full, we accept and skip
+   the cache — DoS-by-cache-exhaustion is uninteresting compared to
+   the rest of the surface.
+4. **Look up `peer.pk` and `peer.managed`** in one query.
+5. **Verify the detached Ed25519 signature** against the canonical
+   signed-message bytes (see *Wire format* above).
+6. **TOFU promote.** A valid signature on a `managed=0` peer flips the
+   flag to 1 in the same request. The promote is best-effort — if the
+   DB write fails, the original request is still served, the next
+   heartbeat will retry.
+7. **Bind the trusted id to the body.** After the handler parses JSON,
+   the body's `id` field must match the header's `X-RD-Device-Id`.
+   Mismatch is 401 — this is the gate that stops a signed request from
+   being repurposed to write to a different peer's row.
+
+If no signature headers are present and the peer is `managed=0`, the
+verifier returns `LegacyUnsigned`; the handler then calls
+`enforce_managed_for_id(body.id)` after parsing the body, which still
+rejects unsigned requests for any *other* peer that has since become
+managed.
+
+## Agent-side signing
+
+The signer is one small module: [`vendor/rustdesk/src/hbbs_http/sign.rs`](https://example.invalid/sign.rs)
+in the hello-agent vendor tree. It reads the existing
+`Config::get_key_pair()` (returns `(sk, pk)` from `hello-agent.toml`)
+and the existing `Config::get_id()`, builds the canonical message, and
+calls `sodiumoxide::crypto::sign::sign_detached`. Returns the two
+header lines joined by `\n`, ready for the multi-header parser in
+`common.rs::post_request_`.
+
+The agent always tries to sign. If the keypair hasn't been generated
+yet (extremely early boot, before rendezvous has run), the signer
+returns `None`, the request goes out unsigned, and:
+
+- If `peer.managed=0`: server accepts it (legacy path).
+- If `peer.managed=1`: server returns 401, the agent's next heartbeat
+  retries.
+
+This is the only condition under which a hello-agent build sends an
+unsigned request, and it self-resolves on the next sync tick.
+
+## Operational gotchas
+
+- **Stock RustDesk clients keep working** because they post unsigned
+  and their peer rows stay at `managed=0`. The first time you install
+  hello-agent on a device, the existing `peer.pk` row gets reused (the
+  agent re-generated a keypair iff `hello-agent.toml` was wiped). The
+  first signed heartbeat then promotes the row.
+- **`hello-agent --uninstall` preserves the keypair.** A reinstall is
+  transparent — signing keeps working.
+- **Wiping `hello-agent.toml` between sessions** does mean the next
+  boot generates a new keypair. The rendezvous server will treat that
+  as a key roll (`register_pk of … due to key not confirmed`) and
+  store the new `pk`. The signed HTTP API picks up the new key as soon
+  as that rendezvous step completes — usually within a few seconds.
+  See [the stale-key recovery note in hello-agent's README](https://example.invalid/README.md)
+  for the supporter-side symptoms of a key drift.
+- **Clock skew over ±5 minutes** will reject signatures. If your
+  fleet shows scattered 401s on heartbeat, check NTP on the affected
+  hosts. The server side is the canonical clock.
+- **Replay cache survives only inside a single hbbs process.** A
+  restart clears it. Combined with the 300-second skew window this
+  means a captured signature is replayable across a restart if and
+  only if both restarts happen inside that window — an acceptable
+  trade-off for keeping the cache in-memory.
+- **One server, mixed fleet.** Stock clients and hello-agent clients
+  can target the same hbbs without any flag-level config. The gate is
+  per-peer.
+
+## Failure modes & log lines
+
+| Symptom                                                                 | Likely cause                                                                                  |
+|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
+| Heartbeats from a known peer suddenly return 401                        | Peer was just promoted (TOFU or admin) and the agent build doesn't sign yet → upgrade agent.  |
+| Heartbeats fail intermittently with 401                                 | Clock skew > 5 min, or NAT churn replaying a captured request inside the window.              |
+| `peer X TOFU-promoted to managed=1` in hbbs log                         | Normal — first valid signature from a previously-unsigned peer.                               |
+| `admin <user> set peer X managed=<bool> via dashboard`                  | Normal — operator used the Devices toggle.                                                    |
+| `peer_set_managed(X) failed: …`                                         | DB write failed during TOFU promote. The request was still served; next request will retry.   |
+| Admin row shows **Unsigned** for a peer running hello-agent             | Agent hasn't completed its first signed POST yet (keypair race), or it's running a build      |
+|                                                                         | that pre-dates the signing patch — check `vendor/rustdesk/src/hbbs_http/sign.rs` is present.  |
+
+## File map
+
+Server:
+
+| Path                                      | Purpose                                                          |
+|-------------------------------------------|------------------------------------------------------------------|
+| `src/api/device_auth.rs`                  | The verifier (extractor + replay cache + TOFU promote).          |
+| `src/api/heartbeat.rs`, `src/api/sysinfo.rs`, `src/api/unattended.rs` | Wired to call `verify` then `enforce_managed_for_id`.    |
+| `src/api/peers.rs::set_managed`           | `PUT /api/peers/:id/managed` admin endpoint.                     |
+| `src/api/admin/pages/devices.rs::toggle_managed` | Dashboard action handler.                                 |
+| `src/database.rs::M2_SOFT_ALTERS`         | `ALTER TABLE peer ADD COLUMN managed`.                           |
+| `src/database.rs::peer_get_auth, peer_set_managed` | DB helpers (untyped `sqlx::query` so they survive the no-DB-migrated dev build). |
+
+Agent — hello-agent vendor tree:
+
+| Path                                                       | Purpose                                                       |
+|------------------------------------------------------------|---------------------------------------------------------------|
+| `vendor/rustdesk/src/hbbs_http/sign.rs`                    | The signer.                                                   |
+| `vendor/rustdesk/src/hbbs_http/sync.rs` (call sites)       | Heartbeat + sysinfo POSTs now sign.                           |
+| `vendor/rustdesk/src/common.rs::post_request_, parse_simple_header` | Header parser now accepts `\n`-separated `Name: Value` pairs (backward-compatible). |
+
+Agent — hello-agent crate (outside the vendor tree):
+
+| Path                                | Purpose                                                                                 |
+|-------------------------------------|-----------------------------------------------------------------------------------------|
+| `src/unattended_password.rs::try_report` | Reports the per-boot password to `/api/unattended-password`; now signs the POST.   |
+
+## Out of scope
+
+Other agent / management endpoints exist on the same server. They are
+deliberately *not* covered by this gate because their trust model is
+different:
+
+| Endpoint                       | Why it isn't signature-gated                                                                              |
+|--------------------------------|-----------------------------------------------------------------------------------------------------------|
+| `POST /api/devices/cli`        | Enrollment via `rustdesk --assign --token <T> …`. Already authenticated by a user/admin bearer session; the operator's job is to *supply* an arbitrary `(id, uuid)` for binding. Requiring the device's `sk` would defeat the use case. |
+| `GET  /api/sysinfo_ver`        | Returns a single public version string. No body, no DB write — no spoof surface to gate.                  |
+| `POST /api/record`             | Session-recording upload. Disabled by default in the OSS uploader; managed builds use it under a separate auth model. Out of scope for the current sweep. |
+| `POST /api/login`, `/api/login-options`, `/api/currentUser`, `/api/logout` | User session management — separate auth model (password + TOTP / OIDC). |
+| Everything under `/api/ab/*`, `/api/audit/*`, `/api/peers*`, `/api/2fa/*`, `/api/oidc/*`, `/admin/*` | Already gated by `AuthedUser` (cookie or bearer). |
@@ -303,6 +303,85 @@ keys and what each one does.

 ---

+## Agent API signing (per-peer)
+
+`POST /api/heartbeat`, `POST /api/sysinfo`, and
+`POST /api/unattended-password` are the three agent-facing endpoints
+that write per-device state. Stock RustDesk and managed builds
+(hello-agent) both call the first two; only managed builds use the
+third. Each peer row has a `managed` flag that gates whether the
+server requires a per-request Ed25519 signature on these endpoints;
+everything else (`/api/peers`, `/api/ab/*`, audit, recordings, OIDC,
+etc.) is unaffected. See [AGENT-API-AUTH.md](AGENT-API-AUTH.md) for
+the full out-of-scope list.
+
+| `peer.managed` | Heartbeat / sysinfo behaviour                                                          |
+|----------------|----------------------------------------------------------------------------------------|
+| `0` (default)  | Unsigned posts accepted (stock-client compatible). Signed posts still verified.        |
+| `1`            | Signature required; unsigned posts return 401. First valid sig auto-promoted to here.  |
+
+Default is `0` after the migration, so **stock RustDesk clients are not
+affected by the rollout** — they keep posting unsigned, the server keeps
+accepting. The first valid signature the server sees from a peer is the
+TOFU promote: that peer's `managed` flips to `1` for good, and unsigned
+requests claiming that `id` are rejected from then on.
+
+The wire format and verification details live in
+[AGENT-API-AUTH.md](AGENT-API-AUTH.md). What you need to know to operate:
+
+### Dashboard
+
+The Devices page has a per-row **Auth** column:
+
+- *Signed* (emerald badge) — `peer.managed = 1`. The peer's heartbeat
+  and sysinfo posts must carry a valid signature; spoofed unsigned
+  requests are rejected.
+- *Unsigned* (slate badge) — `peer.managed = 0`. Legacy path. Anyone
+  who knows the id+uuid can post inventory and heartbeats as this
+  device.
+
+The row's action menu has two new entries (mutually exclusive based on
+current state):
+
+- **Require signed API** — flips `managed` to 1 (no confirm — it
+  strengthens security). Useful for pre-enrolling a peer record
+  before the agent has booted, or for force-locking a peer if you
+  want to fail fast when an agent is not signing yet.
+- **Allow unsigned API** — flips `managed` to 0 (confirm dialog,
+  because this reopens the spoofing surface). Use when a managed
+  agent has been uninstalled and replaced with stock RustDesk on the
+  same hardware.
+
+### API
+
+`PUT /api/peers/:id/managed` with body `{"managed": true|false}`, gated
+on the `is_admin` flag of the calling session, returns
+`{"ok":true,"managed":<bool>}`. Same effect as the dashboard toggle —
+the dashboard handler just calls this internally after reading the
+current value to avoid stale-toggle races.
+
+### Operational notes
+
+- **Mixed fleets are fine.** Stock and hello-agent clients can target
+  the same hbbs. The gate is per-peer, not per-deployment.
+- **Replacing hello-agent with stock RustDesk on a device.** The
+  device's `peer.managed` is stuck at 1; the stock client doesn't
+  sign and will start getting 401s. Either re-deploy a signing build
+  *or* flip the peer back to Unsigned in the dashboard.
+- **TLS still recommended.** Signing protects against id+uuid spoof,
+  not against the unsigned-by-default endpoint surface elsewhere
+  (`/api/login`, `/api/record`, dashboard) — those still rely on
+  whatever TLS termination is in front of hbbs. See *TLS deployment*
+  earlier in this doc.
+- **Clock skew tolerance is ±5 minutes.** If a host's clock drifts
+  past that, heartbeat starts failing 401. Keep NTP healthy on
+  managed peers; the server's clock is the canonical one.
+- **The replay cache lives in-memory only.** A hbbs restart clears
+  it. The 5-minute timestamp window bounds the worst-case replay
+  exposure across restarts.
+
+---
+
 ## Address books

 - **Personal books** are owned per-user and managed from the user's desktop client. The dashboard surfaces them read-only.
@@ -326,7 +405,7 @@ If you set `--ab-legacy-mode=on`, `/api/ab/personal` 404s and clients fall back
 | `/admin/oidc/providers` | none | JSON list of enabled providers, used by login.html |
 | `/admin/login/oidc/:name` | none | Starts admin OIDC flow (302s to IdP) |
 | `/admin/pages/users` | cookie + admin | Users page fragment (incl. inline edit-profile / password-reset / TOTP-disable per row) |
-| `/admin/pages/devices` | cookie + admin | Devices (incl. delete) |
+| `/admin/pages/devices` | cookie + admin | Devices (incl. delete, force-disconnect, force-sysinfo, toggle managed-auth — see [AGENT-API-AUTH.md](AGENT-API-AUTH.md)) |
 | `/admin/pages/groups` | cookie + admin | Device groups |
 | `/admin/pages/strategies` | cookie + admin | Strategy management |
 | `/admin/pages/address-books` | cookie + admin | Personal + shared books |