Media Prep
How to select and configure your shots for processing with DeepEditor.
General Content Specifications
Your audio and video must have the same number of frames so DeepEditor can accurately determine where to make changes to the image.
DeepEditor doesn't decide the placement of the new audio; that's up to you! Ensuring the audio and video match in length keeps you in charge.
🎞️ Video Content Specifications
The Source Media and Driving Data assets must be a single video from a single continuous clip.
Driving Data can also be an audio clip - see the required Asset Specifications for audio files.
NOTE: DeepEditor currently supports vubbing one person per shot. If there are multiple faces in the shot, you need to obscure any faces that will not be vubbed, before uploading your media.
We are actively working on support for vubbing multiple faces - watch this space!

👤 Video Content Best Practices
While most shots can be vubbed, specific factors can enhance or diminish the outcome.
- Jaw in shot: The whole face is visible. Avoid extreme closeups.
- Head rotation: Both eyes of each speaking character should be visible at all times.
- Head tilt: The whole face should be visible. Avoid extreme angles. Shots are most effective when both eyes are clearly seen.
🚨 CHALLENGING - Head rotation beyond 100 degrees while still requiring vubbing.
Facial hair: Smooth face or minimal facial hair. Avoid full beards.
Occlusions: Ensure that nothing obstructs the lower half of the face.
Lighting: All facial features are lit enough to be distinguishable.
Distortion: Avoid atypical appearances (for example, excessive makeup or non-human features).
It's crucial that the shot includes some view of the mouth interior in the source clips, particularly when no extra training data is provided, as this helps DeepEditor understand the appearance of someone's teeth and ensures the output looks accurate.
DeepEditor currently accommodates video and audio assets with a maximum duration of 1,440 frames or roughly 50 seconds.
🎧 Audio Content
The audio should solely feature the actor whose voice you intend to use as driving data. Ensure that no other actors are speaking in the clip.
The audio source must be clean and free of M&E (Music and Effects).
✂️ Assembly
Now that you know what creates the best result, decide what and when you want to Vub in your NLE of choice.
Place the preferred audio beneath the desired video clip.
Note: Movements of the head, neck, and body can contribute to a seamless Vub. If possible, align the dialogue with the original spoken words.
Create in and out points in your timeline.
Export the Source and the Driving Data according to the supported Asset Specifications.