elevenLabsTranscriptToCaptions()v4.0.443

Turns the output from the ElevenLabs Speech to Text API into an array of Caption objects.

This function can be used in any JavaScript environment, but you should not use the ElevenLabs API in the browser because your API key will be exposed.

Example

When calling the ElevenLabs Speech to Text API, you must set timestamps_granularity to "word" to include word-level timing in the response.

Example usage
import fs from 'fs';
import {elevenLabsTranscriptToCaptions} from '@remotion/elevenlabs';

const form = new FormData();
form.append('file', new Blob([fs.readFileSync('audio.mp3')]));
form.append('model_id', 'scribe_v2');
form.append('timestamps_granularity', 'word');

const response = await fetch('https://api.elevenlabs.io/v1/speech-to-text', {
	method: 'POST',
	headers: {
		'xi-api-key': process.env.ELEVENLABS_API_KEY!,
	},
	body: form,
});

const transcript = await response.json();
const {captions} = elevenLabsTranscriptToCaptions({transcript});

API

Arguments

An object with the following property:

`transcript`

The response from the ElevenLabs Speech to Text API. Must include a words array with word-level timing — ensure the API is called with timestamps_granularity set to "word".

The words array should contain objects with the following fields:

text: The word text
type: "word", "spacing", or "audio_event" — only "word" entries are used
start: Start time in seconds
end: End time in seconds

Return value

An object with the following property:

`captions`

An array of Caption objects.

Example​

API​

Arguments​

transcript​

Return value​

captions​

See also​

Example

API

Arguments

`transcript`

Return value

`captions`

See also