elevenLabsTranscriptToCaptions()v4.0.443
Turns the output from the ElevenLabs Speech to Text API into an array of Caption objects.
This function can be used in any JavaScript environment, but you should not use the ElevenLabs API in the browser because your API key will be exposed.
Example
When calling the ElevenLabs Speech to Text API, you must set timestamps_granularity to "word" to include word-level timing in the response.
Example usageimport fs from 'fs'; import {elevenLabsTranscriptToCaptions} from '@remotion/elevenlabs'; const form = new FormData(); form.append('file', new Blob([fs.readFileSync('audio.mp3')])); form.append('model_id', 'scribe_v2'); form.append('timestamps_granularity', 'word'); const response = await fetch('https://api.elevenlabs.io/v1/speech-to-text', { method: 'POST', headers: { 'xi-api-key': process.env.ELEVENLABS_API_KEY!, }, body: form, }); const transcript = await response.json(); const {captions} = elevenLabsTranscriptToCaptions({transcript});
API
Arguments
An object with the following property:
transcript
The response from the ElevenLabs Speech to Text API.
Must include a words array with word-level timing — ensure the API is called with timestamps_granularity set to "word".
The words array should contain objects with the following fields:
text: The word texttype:"word","spacing", or"audio_event"— only"word"entries are usedstart: Start time in secondsend: End time in seconds
Return value
An object with the following property:
captions
An array of Caption objects.