Skip to main content

elevenLabsTranscriptToCaptions()v4.0.443

Turns the output from the ElevenLabs Speech to Text API into an array of Caption objects.

This function can be used in any JavaScript environment, but you should not use the ElevenLabs API in the browser because your API key will be exposed.

Example

When calling the ElevenLabs Speech to Text API, you must set timestamps_granularity to "word" to include word-level timing in the response.

Example usage
import fs from 'fs'; import {elevenLabsTranscriptToCaptions} from '@remotion/elevenlabs'; const form = new FormData(); form.append('file', new Blob([fs.readFileSync('audio.mp3')])); form.append('model_id', 'scribe_v2'); form.append('timestamps_granularity', 'word'); const response = await fetch('https://api.elevenlabs.io/v1/speech-to-text', { method: 'POST', headers: { 'xi-api-key': process.env.ELEVENLABS_API_KEY!, }, body: form, }); const transcript = await response.json(); const {captions} = elevenLabsTranscriptToCaptions({transcript});

API

Arguments

An object with the following property:

transcript

The response from the ElevenLabs Speech to Text API. Must include a words array with word-level timing — ensure the API is called with timestamps_granularity set to "word".

The words array should contain objects with the following fields:

  • text: The word text
  • type: "word", "spacing", or "audio_event" — only "word" entries are used
  • start: Start time in seconds
  • end: End time in seconds

Return value

An object with the following property:

captions

An array of Caption objects.

See also