Akamai Cloud Queues + Notifications — architecture
Audience: Akamai cloud architects + Fermyon SMEs. For customer-facing materials see the customer brief. For one-pager wiring + sequence flows, see docs/architecture.md. For full ADRs see DECISIONS.md.
1. TL;DR
Akamai Cloud Queues + Notifications is a globally-distributed queue + topic service
on the same 27-region NATS JetStream substrate that backs Akamai Cloud KV (codename
nats-kv). Customer-facing HTTP API matches AWS SQS+SNS shape, plus:
- Subject-pattern subscribe (
orders.*.created) — first-class, not message-attribute filter. - In-browser direct subscribe via SSE without API-Gateway+Lambda+WebSocket glue.
- Replay from any sequence — JetStream stream behind every topic.
- Native KV cross-product — single
change_stream_topicflag on bucket-create turns every KV write into a topic event. Unlocks CQRS / real-time UI / audit-by-default in one config line. - No per-message billing — flat infra. SQS+SNS at scale is $0.40-$0.50/M plus data-transfer.
- Geographic placement control per resource (R1/R3/R5,
geo:auto/na/eu/ap).
Designed-for; not a managed offering. Substrate is proven. The customer-facing surface is the unknown to validate.
2. One-page wiring
3. Components
| Component | What | Where |
|---|---|---|
mq-adapter |
Stateless data plane: REST in / JetStream out; KeyCache (bucket-watched bearer validation);
PushSubManager (per-sub delivery loop); CircuitBreaker; SSE handler; /metrics |
per region (DaemonSet on leaves, Deployment on LZ) |
mq-control |
Admin plane: tenant CRUD, key issue/revoke, invite mint, claim flow, audit log. Embeds KV→MQ change-stream bridge runtime per ADR-018. | LZ only, single replica, SQLite on PVC |
| FWF Spin apps (user + admin) | Browser-facing UIs. User app serves /, /play, /verify, /loadtest,
/topology, /api-explorer, /guide (open), /docs (open).
Admin app lives behind its own gate on a separate hostname. Both proxy /api/* inward. |
Akamai Functions runtime (every Akamai POP) |
| NATS cluster | 27-region JetStream cluster; shared with nats-kv via separate account. | LKE-Enterprise clusters (1 LZ + 26 leaves) |
| Akamai property | Single hostname; per-path origin override routes streams to LKE direct (bypassing FWF wall-clock). CPS cert from shared enrollment 293468. | edge |
4. Auth model
Three-token gate (inherited from ADR-024 in nats-kv):
| Token | Where lives | Validated by |
|---|---|---|
| UI gate token | Browser cookie (set via ?access=<t>) |
FWF Spin app — page access only |
| Admin gate token | Admin Spin app cookie | Admin Spin app — separate from this app |
| Tenant bearer | Browser localStorage / SDK env | mq-adapter via KeyCache (sha256(bearer) looked up in mq-admin-keys-v1 bucket) |
| Admin bearer | Vault path api/nats-mq/admin rendered to /vault/secrets/admin-token |
both adapter (cross-tenant bypass) and control (admin endpoints) |
Adapter watches mq-admin-keys-v1 via JetStream KV watch — sub-second revocation
propagation across every region. No HTTP polling (nats-kv landed on polling after an
R1-cross-cluster-leader edge case; our prototype runs single-cluster so the watch is reliable).
5. HTTP push delivery contract
For HTTP-push subscriptions, the adapter holds the JetStream pull consumer and POSTs each message to the customer URL. Per-message headers:
| Header | Value |
|---|---|
X-MQ-Topic | source topic (un-scoped) |
X-MQ-Subject | full NATS subject — useful with subject-pattern subscribes |
X-MQ-Id | JetStream sequence — your idempotency key |
X-MQ-Attempt | 1..max_deliver |
X-MQ-Timestamp | unix-millis of publish |
X-MQ-Signature | sha256=hex(HMAC-SHA256(secret, ts + "." + body)) (when secret set) |
Subscriber response semantics (ADR-017):
| Status | Adapter action |
|---|---|
| 2xx | Ack. Breaker counter resets. |
| 408 / 429 | Nak + backoff (1s, 2s, 4s, 8s, 16s, cap 60s). Counts toward breaker. |
| Other 4xx | Term → DLQ on FIRST delivery. Does not trip the breaker — assumed misconfig. |
| 5xx / network / timeout | Same as 408/429. |
Per-adapter, per-subscription circuit breaker (in-memory): 10 consecutive transient failures trip; 30s initial cooldown, exponential to 5min cap; half-open probe on next pull.
6. SSE / WebSocket — the FWF bypass
Long-lived connections cannot terminate in a Spin function (FWF wall-clock 30s, soon 35s).
The Akamai property has a per-path origin rule that routes /v1/subscriptions/*/stream and
/v1/ws/* straight from the CDN to the LKE adapter (NO_STORE, 1h read timeout).
The path also carries Akamai SSE pass-through metadata so the edge doesn't buffer the
stream (recipe inherited from sse-cdn): http.buffer-response-v2 off,
modify-outgoing-response.chunk-to-client on,
http.forward-chunk-boundary-alignment on. Without this, the edge holds frames
until the response completes — fatal for live streams.
Customer SDKs see one hostname — the path-based routing is invisible to them:
const es = new EventSource(
'https://mq.connected-cloud.io/v1/subscriptions/' + subId + '/stream?bearer=' + bearer
);
es.addEventListener('message', e => {
const frame = JSON.parse(e.data);
console.log(frame.subject, atob(frame.body_b64));
});
Auth on the SSE path: EventSource can't set the Authorization header (browser
limitation), so the adapter accepts the tenant bearer in a ?bearer= query
parameter as a fallback. Same KeyCache validation as the header path — no separate token type.
The adapter sets Access-Control-Allow-Origin: * on every response so third-party
sites can subscribe to a tenant's stream directly without a proxy. FWF features (auth pre-check,
rate limit) don't apply on this path since the CDN bypasses FWF entirely.
7. Cross-product (KV → MQ)
Per ADR-007: POST /v1/buckets in nats-kv accepts optional change_stream_topic.
On the next KV write, the mapping (bucket → MQ topic) flows through a bridge in
mq-control (ADR-018) and republishes the write into MQ's account.
The bridge sits on the MQ side per ADR-018. KV exports $KV.> read-only;
MQ imports it. No cross-account write credential needed. The bridge is dormant when no
mappings exist.
POST https://nats-kv.connected-cloud.io/v1/buckets
Authorization: Bearer <kv-tenant>
Content-Type: application/json
{ "name": "cart", "replicas": 3, "geo": "auto",
"change_stream_topic": "cart-events" }
# subsequent: each KV write to bucket "cart" appears on
# t.<tenant>__cart-events.<key> in the MQ account.
# Customer's HTTP push subscriber sees it as a normal MQ topic message.
This single flag unlocks: CQRS (source-of-truth bucket + projection buckets), real-time UI (browser EventSource on a bucket's change stream), audit-by-default (every write also goes to an audit topic for a compliance archiver function), saga / workflow orchestration. What costs days on AWS (DynamoDB Streams → Lambda → EventBridge → SNS) is one config line here.
8. vs SQS+SNS
| Capability | SQS / SNS | nats-mq |
|---|---|---|
| Global by default | ✗ regional | ✓ 27-region single namespace |
Subject-pattern subscribe (orders.*.created) | ✗ msg-attribute filter | ✓ NATS native |
| Replay from sequence | ✗ | ✓ JetStream stream |
| In-browser direct subscribe | ✗ (API GW + Lambda + WS bridge) | ✓ SSE / WS |
| Cross-product with KV | ~ DynamoDB Streams + Lambda + EventBridge | ✓ one flag |
| Per-message billing | $0.40-0.50/M | $0 (flat infra) |
| Geo placement control | ✗ account-level | ✓ per-queue R1/R3/R5 + geo |
| HMAC webhook signing standard | ~ optional | ✓ default |
| Native SDKs in 30+ languages | ✓ | ✗ HTTP API only (NATS SDK for native) |
| Built-in mobile push / email / SMS | ✓ SNS | ✗ function-as-deliverer recipes |
| Production SLA | ✓ | ✗ demo-grade |
9. Spin function consumer pattern
The function is a normal HTTP-triggered Spin handler — no special trigger needed (no wasi:messaging today; see §10). The adapter holds the NATS subscription and POSTs to the function URL.
use spin_sdk::{http::*, http_component, variables};
use hmac::{Hmac, Mac}; use sha2::Sha256;
#[http_component]
async fn handle(req: Request) -> anyhow::Result<impl IntoResponse> {
// 1. Verify HMAC (return 400 on mismatch — Term-to-DLQ, NOT 401 which would trip breaker)
let secret = variables::get("mq_secret")?;
let ts = header(&req, "x-mq-timestamp").unwrap_or_default();
let sig = header(&req, "x-mq-signature").unwrap_or_default();
let mut mac = Hmac::<Sha256>::new_from_slice(secret.as_bytes())?;
mac.update(format!("{}.", ts).as_bytes());
mac.update(req.body());
let expected = format!("sha256={}", hex::encode(mac.finalize().into_bytes()));
if !constant_time_eq(sig.as_bytes(), expected.as_bytes()) {
return Ok(Response::builder().status(400).body("bad signature").build());
}
// 2. Your business logic — return 2xx to ack, 5xx to retry, 4xx to DLQ.
let subject = header(&req, "x-mq-subject").unwrap_or_default();
process(&subject, req.body()).await?;
Ok(Response::builder().status(200).body("ok").build())
}
Three full recipes (Slack, Telegram fallback, APNS) live at docs/recipes/.
10. wasi:messaging?
Not present in FWF as of Spin 3.6.3 (verified). HTTP push is the v1 function-as-subscriber
shape because it works today with zero platform changes. When Fermyon ships the WIT, our
subscription registry's delivery_type field is forward-compatible (currently
http_push | sse; would gain wasi_messaging as another value).
11. Operational gotchas (worth knowing before you build on this)
| Gotcha | Mitigation |
|---|---|
ack_wait default is 25s (not 30s) |
5s buffer under FWF's 30s wall-clock. Bumps to 30s when FWF moves to 35s. |
| 4xx from a webhook (other than 408/429) is treated as misconfiguration | Goes to DLQ immediately. Doesn't trip the breaker. Operator signal: DLQ growth with attempt_count=1. |
| R3 is the default replica count, not R1 | Single-node losses cause data loss; we learned this on nats-kv's geocode-r1 incident. R1 is explicit opt-in. |
| Cross-product loop (function subscribes to bucket change-stream and writes back to same bucket) | Documentation-only gate. "Source-of-truth bucket + projection bucket; never write back to the source you're consuming." |
| Per-tenant MLT metric cardinality capped to curated set | Prevents Prometheus blow-up across N tenants × M metrics. Logs + traces stay per-tenant. |
| NATS errors → HTTP status codes propagate explicitly | No silent message-eating. nats: maximum messages exceeded → 507, no responders → 502, etc. |
| Push delivery uses queue-group across adapters (ADR-014) | Free load-balancing + auto-failover via JetStream pull-consumer semantics. Customer sees variable egress IPs across Linode AS63949 — CIDR-friendly stable egress is a v2 path. |
| SQLite-on-PVC for invites + audit is single-writer, no replication | PVC snapshot + sqlite3 .backup to object storage is required before going beyond demo. Tenants + keys are in NATS admin buckets (R5/NA) so the data plane survives PVC loss. |
12. Repos & ADRs
- github.com/ccie7599/nats-mq — this repo (Akamai Cloud Queues + Notifications)
- github.com/ccie7599/nats-kv — sibling (Akamai Cloud KV)
- SCOPE.md — scope contract
- DECISIONS.md — 18 ADRs (006, 007, 012, 013, 014, 015, 016, 017, 018 have full text)
- docs/architecture.md — wiring + sequence flows for all 5 request paths
- docs/api.md — HTTP API reference
- docs/runbook.md — operator runbook
- HANDOFF.md — delivery / new-engineer handoff
- docs/recipes/ — APNS, Slack, Telegram
Questions or PRs welcome — owner: brian@apley.net. Demo-grade research POC; not production.