Skip to main content

Transcribing audio

Remotion provides several built-in options for transcribing audio to generate captions:

Comparison

@remotion/install-whisper-cpp@remotion/whisper-web@remotion/openai-whisper
EnvironmentServer (Node.js)Client (Browser)Cloud (API)
SpeedFast (depends on hardware)Slow (WASM overhead)Fast
CostFreeFreePaid (OpenAI API pricing)
Offline support
No server needed
Convert functionconvertToCaptions()toCaptions()openaiWhisperApiToCaptions()

The Caption type

All of these options can output captions in the Caption type format, which is recommended for use with Remotion. This format:

Alternatives

You can use other ways of transcribing audio, such as ElevenLabs.
You can also define your own caption format and not rely on the Caption type - this page is solely about the built-in options.

See also