tubebrain

tubebrain is a Rust MCP stdio server for structured YouTube transcript extraction, metadata lookup, and caption-language discovery.

It is designed for AI agents that need typed transcript data instead of raw page text or one-off scraping output.

It exposes four stable VOD MCP tools:

  • get_transcript
  • get_transcript_section
  • list_languages
  • get_metadata

It also includes experimental TubeBrain live-session tools for resolving YouTube Live and HTTP audio streams:

  • start_stream
  • poll_stream
  • stop_stream
  • list_streams

Live audio byte ingestion is present, and fetched chunks are handed to the stream transcription boundary. HLS fMP4 init maps are handled before transcription, inherited HLS byte-range offsets are normalized before fetch, and direct HTTP audio streams are buffered into bounded chunks with MP3 frame-aligned flushes and MP3/AAC/MP4 decode hints. Default builds do not include live STT; builds with the whisper feature attempt local Whisper transcription over overlapping live windows. Whisper builds default to base.en, with operator overrides for the local model and live window timing. YouTube Live HLS fetches also solve player n challenges with Node and pinned yt-dlp EJS scripts before requesting manifest or segment bytes. Treat the stream tools as an active product surface rather than a finished transcription path. Poll responses include stream health, the last non-error diagnostic, and the last ingestion or transcription error.

Why this project exists:

  • structured typed outputs instead of raw text dumps
  • optional PoToken support for YouTube BotGuard rollout cases
  • optional local Whisper fallback for captionless videos
  • single compiled binary with published release tarballs for the stable VOD surface

Start here:

Repository links: