Skip to main content

Generate captions

To generate captions, the Remotion Recorder uses Whisper.cpp for fast and accurate transcriptions. Each time you record a clip with the Remotion Recorder, captions are automatically generated and persisted to same folder as your recordings.

Installing Whisper.cpp

The very first time you finish recording a clip, Whisper.cpp and a 1.5GB model will be installed on your computer. This may take a few minutes. Once installed, captions for the webcam clip will be generated.


Captions are only generated for files with the webcam prefix.

Make corrections

If the AI has made a mistake, no problem, there are various ways to correct the transcriptions manually. See here how to do this.

Generate captions via CLI

For external recordings, you can also generate captions via the CLI.

bun sub.ts
bun sub.ts

Note that the names of the files you want to transcribe need to start with the prefix webcam, all other files will be ingored. The JSON files containing the captions will be generated and saved under public/<composition-id>/sub[timestamp].json.

Non-english languages

If you do not record in English, edit the config/whisper.ts file.

Set the language to a supported value change change the model to a supported value that does not end in .en.
It is advised to choose a larger model if you are transcribing in a non-english language.