Skip to main content

How client-side rendering works

warning

Very experimental feature - expect bugs and breaking changes at any time.
Track progress on GitHub and discuss in the #web-renderer channel on Discord.

The biggest challenge of client-side rendering is that it is not possible to capture the browser viewport.
Only certain HTML elements such as <canvas>, <img>, <video> or <svg> can be captured natively.

Unlike in server-side rendering, where a pixel-perfect screenshot is made, in client-side rendering Remotion places all elements on a canvas based on how it believes they are positioned and appearing in the DOM.

For this, Remotion has developed a sophisticated algorithm for calculating the placement of the elements on the canvas.
Of course, we cannot support all web features, so only a specific subset of elements and styles are supported.

Rendering process

Initialization

First, the component is mounted in the DOM in a place where it is not visible to the user.
Simultaneously, an empty canvas is initialized.

Frame capture process

For each frame that needs to be rendered, the renderer uses element.createTreeWalker() to find all elements and text nodes in the DOM. Nodes that have display: none and their children are skipped.

For each capturable element, the renderer:

  1. Goes up the DOM tree and resets all transform CSS properties to none.
  2. Gets the bounding box using .getBoundingClientRect(), as well as the bounding boxes of the parent elements.
  3. Adds up the transforms and positions to determine the original placement of the element in the DOM.
  4. Gets the pixels of the element - for <svg>, <canvas>, <img> elements, those can be captured. For text nodes, the layout is reconstructed manually.
  5. Draws them to the canvas according to the calculated placement.

Audio capture

Audio from mounted <Audio> and <Video> elements is captured and mixed together, and added to the audio track of the video.

Encoding

Mediabunny is used to encode the frames and processed audio into a video file.

Capturing pixels

For <svg>, <canvas>, <img> elements, the pixels can be captured natively using the widely documented techniques.

For rendering other types of elements, only a subset of properties are supported such as background, border and border-radius. These styles are drawn to the canvas manually with the Canvas 2D API.

Capturing text nodes

For text nodes, more layout calculations need to be made.

Normally, it is not possible to get the bounding box of a text node, but by wrapping a text node in a <span> element, we can call .getBoundingClientRect() on the span to get the bounding box and resolve the transforms as described above.

Using the Canvas 2D API, text is drawn to the canvas, and styles such as font-weight, font-size are applied by getting the computed style of the <span>.

Several edge cases require special handling:

White space collapsing: When a text node starts with whitespace and lands at the beginning of a line, or contains multiple consecutive spaces, the browser may collapse them. To detect this, whitespace is temporarily removed from the string, then added back to the DOM to check if the layout changes, indicating whether whitespace collapsing is applied.

Line wrapping: The exact point where text wraps to a new line is not directly detectable. To determine this, the <span> is cleared and filled word by word. When the bounding box grows its height, it indicates that the last word wrapped to a new line.

Non-Latin characters: Languages like Chinese don't require spaces between characters for word wrapping eligibility. Instead of using .split(' '), the Intl.Segmenter API is used to identify word boundaries for proper line breaking.

RTL support: Right-to-left text (e.g., Arabic or Persian) within a left-to-right string that wraps across multiple lines requires special handling. For example, "من فارسی صحبت میکنم" should render as "من فارسی صحبت" on the first line and "میکنم" on the second line, meaning the word order changes when split across lines. When .getClientRects() returns multiple rectangles for the same node (indicating non-contiguous layout), the second line must be drawn with a large horizontal offset from the right edge, which is not obvious since RTL line breaks normally reset the position to the right edge.

Vertical text: Vertical text layout is not supported by the Canvas API and is out of scope for now.

In the end, the DOM is reset to its original state.

Context isolation

Renders happen in the same browser environment as your app. This means CSS and Tailwind variables will automatically work, but you run the risk of conflicts with the host page.

See Limitations for more details to ensure your code works with client-side rendering.

Contributing

If you are interested in improving the web renderer, for example by adding new styles, see Contributing to client-side rendering.

See also