RFC: Hosted HTTP/SSE API Contract¶

Date: 2026-05-14 Status: draft Owner: Jess Sullivan Linear: TIN-531

Implementation status: GET /v1/health, POST /v1/transcript/section, and the first single-process stream session endpoints are implemented in the tubebrain-hosted binary. Hosted auth supports account-scoped API key records, and protected-preview metering can persist JSONL usage events for restart-safe quota windows. Durable hosted account storage, database-backed billing, and multi-worker session routing remain follow-up work.

Summary¶

Define the first hosted tubebrain.ai API surface around the already-shipped local TubeBrain semantics.

The hosted layer is a convenience wrapper over the FOSS core. It must preserve the same data model and tool behavior as the local MCP server while adding:

HTTP request/response access for VOD transcripts and timestamped sections
polling and SSE delivery for live stream sessions
API-key auth, metering, rate limits, and abuse controls
explicit privacy and retention boundaries

This RFC is an implementation contract for the first hosted service slice. It does not make hosted execution the canonical path; the local MCP binary remains first-class.

Goals¶

Make the hosted API semantics match local MCP tools closely enough that an agent harness can switch between local and hosted modes with minimal logic.
Support the GStack research demo data flow: timestamped YouTube URL -> transcript section -> agent summary/link extraction -> browser actions.
Keep auth, rate limits, retention, and cost controls visible from day one.
Avoid exposing low-level PoToken or resolver internals as public API fields.

Non-Goals¶

Build billing UI or subscription management in this RFC.
Expose a public managed PoToken minting endpoint.
Store raw audio by default.
Replace local MCP stdio with hosted-only behavior.
Promise the audio fingerprinting endpoint in the first deployed MVP.

Versioning¶

All endpoints live under /v1.

Breaking changes require /v2. Additive fields are allowed in /v1; clients must ignore unknown response fields.

All timestamps are Unix milliseconds unless a field name explicitly ends in _s.

Auth¶

Request Authentication¶

Clients authenticate with an API key:

Authorization: Bearer tb_sk_live_...

API keys are opaque. The protected-preview implementation stores only a SHA-256 hash of each API key in memory; future durable key storage must store only hashes or deployment-secret references, never raw keys.

Key Shape¶

Recommended key prefix:

tb_sk_test_ for non-billable development keys
tb_sk_live_ for billable production keys

The prefix is informational. Authorization must rely on server-side key records, not prefix parsing alone.

Scopes¶

Initial scopes:

transcript:read - VOD transcript, metadata, language, and section endpoints
stream:write - start and stop stream sessions
stream:read - poll or subscribe to stream sessions
recognize:write - future audio recognition endpoint
admin:read - account and usage inspection

The GStack demo requires only transcript:read for the hosted path.

Headers¶

Successful responses include:

X-Request-Id: req_...
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 119
X-RateLimit-Reset: 1760000000

Clients may send:

Idempotency-Key: user-generated-key

Idempotency-Key is honored for mutating endpoints such as stream start and stop. It is ignored for pure reads.

Error Envelope¶

Errors use a stable JSON envelope:

{
  "error": {
    "code": "invalid_request",
    "message": "url is required",
    "request_id": "req_01h..."
  }
}

Initial error codes:

Code	HTTP	Meaning
`invalid_request`	400	malformed JSON, missing fields, invalid cursor
`unauthorized`	401	missing or invalid API key
`forbidden`	403	valid key lacks required scope
`not_found`	404	unknown session, video, or route
`conflict`	409	idempotency conflict or terminal session state
`rate_limited`	429	per-key or per-IP limit exceeded
`source_unavailable`	502	upstream media source failed
`transcription_unavailable`	503	STT backend unavailable
`internal_error`	500	unexpected service error

Endpoint Summary¶

Endpoint	Auth scope	Local MCP equivalent	MVP
`GET /v1/health`	none	none	yes
`POST /v1/transcript`	`transcript:read`	`get_transcript`	reserved
`POST /v1/transcript/section`	`transcript:read`	`get_transcript_section`	yes
`POST /v1/languages`	`transcript:read`	`list_languages`	reserved
`POST /v1/metadata`	`transcript:read`	`get_metadata`	reserved
`POST /v1/stream/start`	`stream:write`	`start_stream`	yes
`GET /v1/stream/{session_id}/poll`	`stream:read`	`poll_stream`	yes
`POST /v1/stream/{session_id}/stop`	`stream:write`	`stop_stream`	yes
`GET /v1/stream/{session_id}/events`	`stream:read`	push form of `poll_stream`	yes
`GET /v1/stream`	`stream:read`	`list_streams`	yes
`POST /v1/recognize`	`recognize:write`	`recognize_audio` future tool	deferred

The current hosted implementation intentionally ships transcript/section and stream sessions before the broader VOD metadata/language endpoints because the GTM wedge is live/radio/YouTube source monitoring. Stream sessions are single-process and in-memory until Redis/Postgres account storage and worker routing are added.

Common Request Fields¶

Transcript endpoints accept these common fields where relevant:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac&t=2449s",
  "lang": "en",
  "format": "json"
}

format may be json, markdown, srt, vtt, or text. Hosted JSON responses should prefer structured JSON and return rendered text only when a non-JSON format is explicitly requested.

`GET /v1/health`¶

Readiness endpoint for load balancers and canaries.

Response:

{
  "status": "ok",
  "service": "tubebrain-hosted",
  "version": "0.1.0",
  "core_version": "0.1.9"
}

status values:

ok - service can accept traffic
degraded - service can answer some requests but one dependency is impaired
unavailable - service should not receive traffic

`POST /v1/transcript`¶

Fetch a full structured transcript for a supported VOD URL.

Request:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac",
  "lang": "en",
  "format": "json"
}

JSON response:

{
  "request_id": "req_01h...",
  "transcript": {
    "video_id": "Rzi7oFTzjac",
    "title": "Example title",
    "channel": "Example channel",
    "duration_ms": 4200000,
    "language": "en",
    "source": "caption_auto_generated",
    "segments": [
      {
        "text": "example text",
        "start_ms": 2449000,
        "end_ms": 2453000
      }
    ]
  },
  "cache": {
    "hit": false,
    "ttl_s": 3600
  }
}

The transcript object is the same shape as the local Transcript type.

`POST /v1/transcript/section`¶

Fetch a timestamp-windowed transcript section. This is the primary hosted MVP endpoint for agent workflows and the GStack research demo.

Request:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac&t=2449s",
  "lang": "en",
  "at_s": 2449,
  "before_s": 120,
  "after_s": 600
}

at_s may be omitted when the URL contains a parseable YouTube timestamp.

Response:

{
  "request_id": "req_01h...",
  "section": {
    "video_id": "Rzi7oFTzjac",
    "title": "Example title",
    "channel": "Example channel",
    "duration_ms": 4200000,
    "language": "en",
    "source": "caption_auto_generated",
    "anchor_ms": 2449000,
    "window_start_ms": 2329000,
    "window_end_ms": 3049000,
    "segments": [
      {
        "text": "example text",
        "start_ms": 2449000,
        "end_ms": 2453000
      }
    ]
  },
  "agent_contract": {
    "suggested_task": "summarize_section_and_extract_links",
    "source_url": "https://www.youtube.com/watch?v=Rzi7oFTzjac&t=2449s"
  }
}

The section object is the same shape as the local TranscriptSection type. Default windows match the local server: 120 seconds before and 600 seconds after the anchor.

`POST /v1/languages`¶

Request:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac"
}

Response:

{
  "request_id": "req_01h...",
  "video_id": "Rzi7oFTzjac",
  "languages": [
    {
      "code": "en",
      "name": "English",
      "is_auto_generated": true,
      "is_translatable": true
    }
  ]
}

`POST /v1/metadata`¶

Request:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac"
}

Response:

{
  "request_id": "req_01h...",
  "metadata": {
    "video_id": "Rzi7oFTzjac",
    "title": "Example title",
    "channel": "Example channel",
    "duration_ms": 4200000,
    "has_captions": true,
    "caption_languages": ["en"]
  }
}

`POST /v1/stream/start`¶

Start a live stream transcription session.

Request:

{
  "url": "https://www.youtube.com/watch?v=jfKfPfyJRdk",
  "lang": "en"
}

Response:

{
  "request_id": "req_01h...",
  "session": {
    "session_id": "sess-1",
    "platform": "youtube",
    "title": "Live stream title",
    "channel": "Live channel",
    "started_at": 1760000000000,
    "language": "en",
    "source": "youtube_live_hls"
  }
}

The session object is the same shape as the local StreamSession type.

`GET /v1/stream/{session_id}/poll`¶

Poll a live stream session for transcript segments after a cursor.

Request:

GET /v1/stream/sess-1/poll?cursor=42

Response:

{
  "request_id": "req_01h...",
  "chunk": {
    "session_id": "sess-1",
    "segments": [
      {
        "text": "live words",
        "start_ms": 15000,
        "end_ms": 18000
      }
    ],
    "cursor": 43,
    "is_final": false,
    "buffer_depth_ms": 3000,
    "session_duration_ms": 60000,
    "health": "active",
    "last_diagnostic": null,
    "last_error": null
  }
}

The chunk object is the same shape as the local StreamChunk type.

`POST /v1/stream/{session_id}/stop`¶

Stop a live stream session and return the final buffered chunk.

Request:

POST /v1/stream/sess-1/stop

Response:

{
  "request_id": "req_01h...",
  "chunk": {
    "session_id": "sess-1",
    "segments": [],
    "cursor": 43,
    "is_final": true,
    "buffer_depth_ms": 0,
    "session_duration_ms": 61000,
    "health": "stopped",
    "last_diagnostic": null,
    "last_error": null
  }
}

`GET /v1/stream/{session_id}/events`¶

SSE form of poll_stream.

Request:

GET /v1/stream/sess-1/events?cursor=42
Accept: text/event-stream

Events:

event: chunk
data: {"request_id":"req_01h...","chunk":{"session_id":"sess-1","segments":[{"text":"live words","start_ms":15000,"end_ms":18000}],"cursor":43,"is_final":false,"buffer_depth_ms":3000,"session_duration_ms":60000,"health":"active","last_diagnostic":null,"last_error":null}}

The first implementation emits one chunk event per request using the same cursor semantics as poll. Long-lived heartbeat/final event streams and Last-Event-ID reconnection are reserved for the durable session store work.

`GET /v1/stream`¶

List active sessions for the current API key.

Response:

{
  "request_id": "req_01h...",
  "sessions": [
    {
      "session_id": "sess-1",
      "platform": "youtube",
      "title": "Live stream title",
      "channel": "Live channel",
      "started_at": 1760000000000,
      "language": "en",
      "source": "youtube_live_hls"
    }
  ]
}

Hosted session IDs are scoped to the account that created them. Wrong-account access returns 404 not_found to avoid leaking whether a session exists.

`POST /v1/recognize`¶

Reserved for the Phase F audio recognition surface. Do not implement in the first hosted MVP unless TIN-528/TIN-529/TIN-530 have landed.

Data Flow¶

VOD Section MVP¶

HTTP request
  -> auth and rate-limit check
  -> parse URL/timestamp
  -> local core get_transcript_section semantics
  -> transcript cache write-through
  -> structured JSON response
  -> usage event

The hosted service should call the same Rust library boundary that powers the MCP tool rather than maintaining a separate transcript implementation.

Live Stream¶

start request
  -> auth and session quota check
  -> MediaResolver
  -> SessionManager or hosted session store
  -> background ingestion worker
  -> poll/SSE delivery

For the protected-preview MVP, the accepted model is sticky routing to one active worker plus an in-memory session-owner registry. poll, events, stop, and list are account-scoped. A worker restart or wrong-worker route returns 404 not_found for the old session because raw audio and stream buffers are not durably stored. See Hosted Stream Session Routing.

Before broad paid or multi-replica traffic, sessions need the Redis-backed model reserved by the routing RFC:

Redis for session cursors, active-session indexes, and short-lived buffers
worker ownership metadata and leases so polls route to the right worker
explicit session timeout and cleanup jobs

Persistence Model¶

Current protected-preview implementation:

account/key records are loaded from environment configuration
usage events can be appended to TUBEBRAIN_USAGE_EVENT_LOG as JSONL
tubebrain-hosted rebuilds the current rolling quota window from that JSONL file on restart and ignores duplicate event_id records
stream session state remains in-memory and single-process

Preferred paid-pilot database shape:

PostgreSQL tables:

accounts
api_keys
usage_events
billing_customers
idempotency_keys

Redis keys:

session:{account_id}:{session_id}:metadata
session:{account_id}:{session_id}:segments
session:{account_id}:{session_id}:diagnostics
rate:{account_id}:{window}
rate:ip:{ip}:{window}

Raw transcript segments may be cached for performance. Raw audio must not be persisted by default.

Metering¶

Minimum usage dimensions:

transcript requests
transcript section requests
upstream media fetch attempts
live session starts
live session active seconds
live audio seconds decoded
live STT seconds processed
egress bytes
source failures and retry counts

Usage events should include request_id, account_id, endpoint, outcome, duration, and cost dimensions. They must not include API key material or raw audio bytes.

Minimum storage fields for the first paid-pilot implementation:

Field	Type	Notes
`event_id`	string	Unique usage event ID
`request_id`	string	Matches the public response header/body request ID
`account_id`	string	Customer/account owner
`api_key_id`	string	Stable key ID only, never the raw key
`endpoint`	string	Hosted route or MCP-equivalent operation
`source_kind`	string	`youtube_vod`, `youtube_live`, `http_audio`, or future adapter
`session_id`	string?	Present for stream events
`outcome`	string	`ok`, `client_error`, `source_error`, `transcription_error`, `rate_limited`, `internal_error`
`status_code`	integer?	Hosted HTTP status when applicable
`duration_ms`	integer	Server-side wall-clock duration
`stream_active_ms`	integer?	Active session time
`audio_decoded_ms`	integer?	Decoded media duration
`stt_processed_ms`	integer?	Audio duration submitted to STT
`stt_backend`	string?	Primary STT backend when available
`stt_fallback_mode`	string?	Managed fallback mode when available
`stt_provider`	string?	Managed provider name when available
`estimated_cost_micro_usd`	integer?	Optional cost estimate
`egress_bytes`	integer?	Response/SSE egress estimate
`retry_count`	integer	Source, network, or resolver retries
`error_code`	string?	Stable public error code only
`created_at_unix_s`	integer	Event timestamp

Forbidden storage fields:

raw API keys or bearer token strings
cookies
signed media URL path or query values
PoToken values
BotGuard worker internals
raw audio bytes

Rate Limits¶

Initial conservative defaults:

Limit	Test	Free/design partner	Paid
VOD transcript requests	30/hour	120/hour	tiered
Section requests	60/hour	300/hour	tiered
Concurrent live sessions	1	2	tiered
Live session duration	10 min	30 min	tiered
SSE connections	1/key	4/key	tiered

Rate limits should be enforced per API key and backed by a coarse per-IP abuse limit for unauthenticated or invalid-key traffic.

Current protected-preview quotas are per account, use TUBEBRAIN_USAGE_WINDOW_SECS as a rolling window, and emit x-ratelimit-reset as seconds until the oldest counted event or in-flight reservation exits that window.

Privacy And Retention¶

Default retention:

API request metadata: 30 days
usage events: billing/audit retention
transcript cache: short TTL, initially 1 hour
live segment buffers: session lifetime plus a short cleanup window
raw audio: not persisted
PoToken material: not exposed and not stored beyond operational need
cookies and signed media URLs: not stored as customer-visible records

The service should expose these boundaries in public docs before charging.

Compliance Boundaries¶

Hosted source resolution has a higher risk profile than local execution. Keep these boundaries explicit:

Layer 1 media resolution remains isolated from Layer 2 transcription.
Public API responses must not include resolved signed media URLs, cookies, PoTokens, or BotGuard internals.
Managed PoToken minting is not a public endpoint in v1.
Error messages should be useful but should not leak credential-bearing URLs.

Deployment Shape¶

Recommended first implementation:

crates or workspace members
  tubebrain-core        existing library boundary
  tubebrain            local MCP binary
  tubebrain-hosted   axum HTTP/SSE binary

Preferred stack:

axum for HTTP and SSE
tower middleware for request IDs, auth, tracing, compression, and limits
PostgreSQL for accounts, API keys, usage, and idempotency
Redis for rate limiting and live-session state
background workers in the same binary for the first MVP, split later when live stream load requires it

The hosted service must keep logs on stderr/stdout according to the deployment platform, but the local MCP binary still reserves stdout for protocol traffic.

GStack Demo Contract¶

The hosted demo should use:

POST /v1/transcript/section

with:

{
  "url": "https://www.youtube.com/watch?v=Rzi7oFTzjac&t=2449s",
  "lang": "en",
  "before_s": 120,
  "after_s": 600
}

The calling harness receives the section packet and runs:

summarize the section about gstack and open all the articles described in my browser to read.

Browser-opening actions are outside TubeBrain's API boundary. TubeBrain provides the timestamped transcript context; the harness extracts links and executes browser actions.

Acceptance Criteria¶

TIN-531 is complete when:

this API contract is published in the repo docs
the public hosted RFC points at this concrete contract
the roadmap describes /v1/transcript/section as the first hosted MVP slice
the GStack demo plan maps to the hosted endpoint and local MCP tool
Linear records that implementation should start from POST /v1/transcript/section

Implementation is a follow-up issue, not part of TIN-531.

RFC: Hosted HTTP/SSE API Contract¶

Summary¶

Goals¶

Non-Goals¶

Versioning¶

Auth¶

Request Authentication¶

Key Shape¶

Scopes¶

Headers¶

Error Envelope¶

Endpoint Summary¶

Common Request Fields¶

GET /v1/health¶

POST /v1/transcript¶

POST /v1/transcript/section¶

POST /v1/languages¶

POST /v1/metadata¶

POST /v1/stream/start¶

GET /v1/stream/{session_id}/poll¶

POST /v1/stream/{session_id}/stop¶

GET /v1/stream/{session_id}/events¶

GET /v1/stream¶

POST /v1/recognize¶

Data Flow¶

VOD Section MVP¶

Live Stream¶

Persistence Model¶

Metering¶

Rate Limits¶

Privacy And Retention¶

Compliance Boundaries¶

Deployment Shape¶

GStack Demo Contract¶

Acceptance Criteria¶

`GET /v1/health`¶

`POST /v1/transcript`¶

`POST /v1/transcript/section`¶

`POST /v1/languages`¶

`POST /v1/metadata`¶

`POST /v1/stream/start`¶

`GET /v1/stream/{session_id}/poll`¶

`POST /v1/stream/{session_id}/stop`¶

`GET /v1/stream/{session_id}/events`¶

`GET /v1/stream`¶

`POST /v1/recognize`¶