Transcribing audio
Remotion provides several built-in options for transcribing audio to generate captions:
@remotion/install-whisper-cpp- Transcribe audio locally on a server using Whisper.cpp@remotion/whisper-web- Transcribe audio in the browser using WebAssembly@remotion/openai-whisper- Use the OpenAI Whisper API for cloud-based transcription
Comparison
@remotion/install-whisper-cpp | @remotion/whisper-web | @remotion/openai-whisper | |
|---|---|---|---|
| Environment | Server (Node.js) | Client (Browser) | Cloud (API) |
| Speed | Fast (depends on hardware) | Slow (WASM overhead) | Fast |
| Cost | Free | Free | Paid (OpenAI API pricing) |
| Offline support | ✅ | ✅ | ❌ |
| No server needed | ❌ | ✅ | ✅ |
| Convert function | convertToCaptions() | toCaptions() | openaiWhisperApiToCaptions() |
The Caption type
All of these options can output captions in the Caption type format, which is recommended for use with Remotion. This format:
- Enables usage of the APIs in
@remotion/captions, such ascreateTikTokStyleCaptions() - Matches the format used in the Remotion Editor Starter
- Is compatible with the Animated Captions package
Alternatives
You can use other ways of transcribing audio, such as ElevenLabs.
You can also define your own caption format and not rely on the Caption type - this page is solely about the built-in options.
See also
Caption- The caption data structure@remotion/captions- Caption manipulation utilities