Triage Videos Before Spending Tokens¶
Use this workflow when you have a candidate video list and need to decide which videos are worth deeper transcript extraction or downstream summarization.
Goal¶
Use cheap metadata and caption-surface checks to route videos before calling
get_transcript.
Recommended Tool Sequence¶
get_metadatalist_languagesget_transcriptonly for the videos that pass your filter
Why This Order¶
get_metadata and list_languages are the lightest way to answer:
- is the video public enough to inspect?
- does it have captions at all?
- which languages are available?
That lets you avoid spending time on videos that are private, age-restricted, captionless, or only available in the wrong language.
Walkthrough¶
1. Pull metadata¶
get_metadata(url: "https://youtube.com/watch?v=VIDEO_ID")
Common triage checks:
has_captionsduration_mscaption_languages
Examples:
- skip long videos when you only want short explainers
- skip captionless videos if you are not using the
whisperfeature - route only English-captioned videos into an English summarization queue
2. Confirm language details¶
list_languages(url: "https://youtube.com/watch?v=VIDEO_ID")
Why it matters:
get_metadataonly gives language codeslist_languagestells you whether the track is manual or auto-generated
That is useful when you want to prioritize:
- authored captions over auto-generated captions
- translatable tracks over fixed-language tracks
3. Extract only the videos that pass¶
After triage, fetch transcripts only for the remaining videos:
get_transcript(
url: "https://youtube.com/watch?v=VIDEO_ID",
lang: "en",
format: "json"
)
json is a strong default here because it is easy to feed into routing,
chunking, or later summarization steps.
Example Decision Rules¶
- If
has_captionsisfalse, skip unless you built with--features whisper. - If only auto-generated captions exist, mark the result lower confidence.
- If
caption_languagesdoes not include your target language, skip or route to translation. - If metadata succeeds but transcript fetch fails, classify it as a potential
po-tokencandidate instead of a normal parse failure.
Tips¶
get_metadatais the safest first touch for a new batch of URLs.- Keep
get_transcriptfor the videos that actually pass your policy. - Use Troubleshooting if metadata works but transcript extraction fails later.