Skip to content
  • There are no suggestions because the search field is empty.

Media Prep

How to select and configure your shots for processing with DeepEditor.

General Content Specifications

Your audio and video must have the same number of frames so DeepEditor can accurately determine where to make changes to the image.

DeepEditor doesn't decide the placement of the new audio; that's up to you! Ensuring the audio and video match in length keeps you in charge.

🎞️ Video Content Specifications

The Source Media and Driving Data assets must be a single video from a single continuous clip.

continuous clip

Driving Data can also be an audio clip - see the required Asset Specifications for audio files.

NOTE: DeepEditor currently supports vubbing one person per shot. If there are multiple faces in the shot, you need to obscure any faces that will not be vubbed, before uploading your media.

We are actively working on support for vubbing multiple faces - watch this space!


Multiple faces
👤 Video Content Best Practices

While most shots can be vubbed, specific factors can enhance or diminish the outcome.

  • Jaw in shot: The whole face is visible. Avoid extreme closeups.
  • Head rotation: Both eyes of each speaking character should be visible at all times.
  • Head tilt: The whole face should be visible. Avoid extreme angles. Shots are most effective when both eyes are clearly seen.
✅ EASY - Head rotation is restricted so that both eyes remain visible at all times.

🔶 MEDIUM - Head rotation reaches a full profile. 90 degrees, with only one eye visible or slightly beyond.

🚨 CHALLENGING - Head rotation beyond 100 degrees while still requiring vubbing.

Facial hair: Smooth face or minimal facial hair. Avoid full beards.

9b2fa857d2aaa743d96b15e2d1

Occlusions: Ensure that nothing obstructs the lower half of the face.

woman-6629594_1280

Lighting: All facial features are lit enough to be distinguishable.

dark-still-from-the-batman

Distortion: Avoid atypical appearances (for example, excessive makeup or non-human features).

PENNYWISE

It's crucial that the shot includes some view of the mouth interior in the source clips, particularly when no extra training data is provided, as this helps DeepEditor understand the appearance of someone's teeth and ensures the output looks accurate.

DeepEditor currently accommodates video and audio assets with a maximum duration of 1,440 frames or roughly 50 seconds.

🎧 Audio Content

The audio should solely feature the actor whose voice you intend to use as driving data. Ensure that no other actors are speaking in the clip.

The audio source must be clean and free of M&E (Music and Effects).

✂️ Assembly

Now that you know what creates the best result, decide what and when you want to Vub in your NLE of choice.

DaVinci_Resolve_Studio    logo    download    

Place the preferred audio beneath the desired video clip.

Note: Movements of the head, neck, and body can contribute to a seamless Vub. If possible, align the dialogue with the original spoken words.

Create in and out points in your timeline. 

Export the Source and the Driving Data according to the supported Asset Specifications.