Transcribing audio

Remotion provides several built-in options for transcribing audio to generate captions:

@remotion/install-whisper-cpp - Transcribe audio locally on a server using Whisper.cpp
@remotion/whisper-web - Transcribe audio in the browser using WebAssembly
@remotion/openai-whisper - Use the OpenAI Whisper API for cloud-based transcription

Comparison

	`@remotion/install-whisper-cpp`	`@remotion/whisper-web`	`@remotion/openai-whisper`
Environment	Server (Node.js)	Client (Browser)	Cloud (API)
Speed	Fast (depends on hardware)	Slow (WASM overhead)	Fast
Cost	Free	Free	Paid (OpenAI API pricing)
Offline support	✅	✅	❌
No server needed	❌	✅	✅
Convert function	`toCaptions()`	`toCaptions()`	`openaiWhisperApiToCaptions()`

The `Caption` type

All of these options can output captions in the Caption type format, which is recommended for use with Remotion. This format:

Enables usage of the APIs in @remotion/captions, such as createTikTokStyleCaptions()
Matches the format used in the Remotion Editor Starter
Is compatible with the Animated Captions package

Alternatives

You can use other ways of transcribing audio, such as ElevenLabs.
You can also define your own caption format and not rely on the Caption type - this page is solely about the built-in options.

Comparison​

The Caption type​

Alternatives​

See also​

Comparison

The `Caption` type

Alternatives

See also