tubebrain¶
tubebrain is a Rust MCP stdio server for structured YouTube transcript
extraction, metadata lookup, and caption-language discovery.
It is designed for AI agents that need typed transcript data instead of raw page text or one-off scraping output.
It exposes four stable VOD MCP tools:
get_transcriptget_transcript_sectionlist_languagesget_metadata
It also includes experimental TubeBrain live-session tools for resolving YouTube Live and HTTP audio streams:
start_streampoll_streamstop_streamlist_streams
Live audio byte ingestion is present, and fetched chunks are handed to the
stream transcription boundary. HLS fMP4 init maps are handled before
transcription, inherited HLS byte-range offsets are normalized before fetch,
and direct HTTP audio streams are buffered into bounded chunks with MP3
frame-aligned flushes and MP3/AAC/MP4 decode hints. Default builds do not
include live STT; builds with the whisper feature attempt local Whisper
transcription over overlapping live windows. Whisper builds default to
base.en, with operator overrides for the local model and live window timing.
YouTube Live HLS fetches also solve player n challenges with Node and pinned
yt-dlp EJS scripts before requesting manifest or segment bytes. Treat the stream
tools as an active product surface rather than a finished transcription path.
Poll responses include stream health, the last non-error diagnostic, and the
last ingestion or transcription error.
Why this project exists:
- structured typed outputs instead of raw text dumps
- optional PoToken support for YouTube BotGuard rollout cases
- optional local Whisper fallback for captionless videos
- single compiled binary with published release tarballs for the stable VOD surface
Start here:
- Install
- Agent Guide
- Integrations
- Quickstarts
- Sample Outputs
- Troubleshooting
- Comparison
- Releases
- Hosted API
- Hosted Pilot Policy
- Hosted Preview Runbook
- RFCs
- Roadmap
- Tools
- Development
Repository links: