Triage Videos Before Spending Tokens

Use this workflow when you have a candidate video list and need to decide which videos are worth deeper transcript extraction or downstream summarization.

Goal

Use cheap metadata and caption-surface checks to route videos before calling get_transcript.

  1. get_metadata
  2. list_languages
  3. get_transcript only for the videos that pass your filter

Why This Order

get_metadata and list_languages are the lightest way to answer:

  • is the video public enough to inspect?
  • does it have captions at all?
  • which languages are available?

That lets you avoid spending time on videos that are private, age-restricted, captionless, or only available in the wrong language.

Walkthrough

1. Pull metadata

get_metadata(url: "https://youtube.com/watch?v=VIDEO_ID")

Common triage checks:

  • has_captions
  • duration_ms
  • caption_languages

Examples:

  • skip long videos when you only want short explainers
  • skip captionless videos if you are not using the whisper feature
  • route only English-captioned videos into an English summarization queue

2. Confirm language details

list_languages(url: "https://youtube.com/watch?v=VIDEO_ID")

Why it matters:

  • get_metadata only gives language codes
  • list_languages tells you whether the track is manual or auto-generated

That is useful when you want to prioritize:

  • authored captions over auto-generated captions
  • translatable tracks over fixed-language tracks

3. Extract only the videos that pass

After triage, fetch transcripts only for the remaining videos:

get_transcript(
  url: "https://youtube.com/watch?v=VIDEO_ID",
  lang: "en",
  format: "json"
)

json is a strong default here because it is easy to feed into routing, chunking, or later summarization steps.

Example Decision Rules

  • If has_captions is false, skip unless you built with --features whisper.
  • If only auto-generated captions exist, mark the result lower confidence.
  • If caption_languages does not include your target language, skip or route to translation.
  • If metadata succeeds but transcript fetch fails, classify it as a potential po-token candidate instead of a normal parse failure.

Tips

  • get_metadata is the safest first touch for a new batch of URLs.
  • Keep get_transcript for the videos that actually pass your policy.
  • Use Troubleshooting if metadata works but transcript extraction fails later.