episode-header-image

Jul 2025

1h 11m

MLA 027 AI Video End-to-End Workflow

About this episode

How to maintain character consistency, style consistency, etc in an AI video. Prosumers can use Google Veo 3's "High-Quality Chaining" for fast social media content. Indie filmmakers can achieve narrative consistency by combining Midjourney V7 for style, Kling for lip-synced dialogue, and Runway Gen-4 for camera control, while professional studios gain full control with a layered ComfyUI pipeline to output multi-layer EXR files for standard VFX compositing.

Links

Notes and resources at ocdevel.com/mlg/mla-27
Try a walking desk - stay healthy & sharp while you learn & code
Descript - my favorite AI audio/video editor

AI Audio Tool Selection

Music: Use Suno for complete songs or Udio for high-quality components for professional editing.
Sound Effects: Use ElevenLabs' SFX for integrated podcast production or SFX Engine for large, licensed asset libraries for games and film.
Voice: ElevenLabs gives the most realistic voice output. Murf.ai offers an all-in-one studio for marketing, and Play.ht has a low-latency API for developers.
Open-Source TTS: For local use, StyleTTS 2 generates human-level speech, Coqui's XTTS-v2 is best for voice cloning from minimal input, and Piper TTS is a fast, CPU-friendly option.

I. Prosumer Workflow: Viral Video

Goal: Rapidly produce branded, short-form video for social media. This method bypasses Veo 3's weaker native "Extend" feature.

Toolchain
- Image Concept: GPT-4o (API: GPT-Image-1) for its strong prompt adherence, text rendering, and conversational refinement.
- Video Generation: Google Veo 3 for high single-shot quality and integrated ambient audio.
- Soundtrack: Udio for creating unique, "viral-style" music.
- Assembly: CapCut for its standard short-form editing features.
Workflow
1. Create Character Sheet (GPT-4o): Generate a primary character image with a detailed "locking" prompt, then use conversational follow-ups to create variations (poses, expressions) for visual consistency.
2. Generate Video (Veo 3): Use "High-Quality Chaining."
  - Clip 1: Generate an 8s clip from a character sheet image.
  - Extract Final Frame: Save the last frame of Clip 1.
  - Clip 2: Use the extracted frame as the image input for the next clip, using a "this then that" prompt to continue the action. Repeat as needed.
3. Create Music (Udio): Use Manual Mode with structured prompts ([Genre: ...], [Mood: ...]) to generate and extend a music track.
4. Final Edit (CapCut): Assemble clips, layer the Udio track over Veo's ambient audio, add text, and use "Auto Captions." Export in 9:16.

II. Indie Filmmaker Workflow: Narrative Shorts

Goal: Create cinematic short films with consistent characters and storytelling focus, using a hybrid of specialized tools.

Toolchain
- Visual Foundation: Midjourney V7 to establish character and style with --cref and --sref parameters.
- Dialogue Scenes: Kling for its superior lip-sync and character realism.
- B-Roll/Action: Runway Gen-4 for its Director Mode camera controls and Multi-Motion Brush.
- Voice Generation: ElevenLabs for emotive, high-fidelity voices.
- Edit & Color: DaVinci Resolve for its integrated edit, color, and VFX suite and favorable cost model.
Workflow
1. Create Visual Foundation (Midjourney V7): Generate a "hero" character image. Use its URL with --cref --cw 100 to create consistent character poses and with --sref to replicate the visual style in other shots. Assemble a reference set.
2. Create Dialogue Scenes (ElevenLabs -> Kling):
  - Generate the dialogue track in ElevenLabs and download the audio.
  - In Kling, generate a video of the character from a reference image with their mouth closed.
  - Use Kling's "Lip Sync" feature to apply the ElevenLabs audio to the neutral video for a perfect match.
3. Create B-Roll (Runway Gen-4): Use reference images from Midjourney. Apply precise camera moves with Director Mode or add localized, layered motion to static scenes with the Multi-Motion Brush.
4. Assemble & Grade (DaVinci Resolve): Edit clips and audio on the Edit page. On the Color page, use node-based tools to match shots from Kling and Runway, then apply a final creative look.

III. Professional Studio Workflow: Full Control

Goal: Achieve absolute pixel-level control, actor likeness, and integration into standard VFX pipelines using an open-source, modular approach.

Toolchain
- Core Engine: ComfyUI with Stable Diffusion models (e.g., SD3, FLUX).
- VFX Compositing: DaVinci Resolve (Fusion page) for node-based, multi-layer EXR compositing.
Control Stack & Workflow
1. Train Character LoRA: Train a custom LoRA on a 15-30 image dataset of the actor in ComfyUI to ensure true likeness.
2. Build ComfyUI Node Graph: Construct a generation pipeline in this order:
  - Loaders: Load base model, custom character LoRA, and text prompts (with LoRA trigger word).
  - ControlNet Stack: Chain multiple ControlNets to define structure (e.g., OpenPose for skeleton, Depth map for 3D layout).
  - IPAdapter-FaceID: Use the Plus v2 model as a final reinforcement layer to lock facial identity before animation.
  - AnimateDiff: Apply deterministic camera motion using Motion LoRAs (e.g., v2_lora_PanLeft.ckpt).
  - KSampler -> VAE Decode: Generate the image sequence.
3. Export Multi-Layer EXR: Use a node like mrv2SaveEXRImage to save the output as an EXR sequence (.exr). Configure for a professional pipeline: 32-bit float, linear color space, and PIZ/ZIP lossless compression. This preserves render passes (diffuse, specular, mattes) in a single file.
4. Composite in Fusion: In DaVinci Resolve, import the EXR sequence. Use Fusion's node graph to access individual layers, allowing separate adjustments to elements like color, highlights, and masks before integrating the AI asset into a final shot with a background plate.

Recommended Episodes

How to Scale Video Editing With an AI Storytelling Partner

As AI continues to reshape creative industries, video editing is undergoing a major transformation. Tools like Eddie AI are making the editing process faster, more efficient, and more accessible to filmmakers. By acting as a storytelling partner, AI can quickly generate rough cut ... Show More

The Rise of Generative AI Video Tools

Episode 13: What impact will AI-generated content have on the entertainment industry? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive into this topic, envisioning a future where AI generates interactive movies and complex gaming worlds with in ... Show More

Canva Create 2025 - What's New for Educators? - HoET261

In this exciting crossover episode, Chris Nesi teams up with Leena Marie Saleh (The EdTech Guru) for a detailed look into Canva's latest educational innovations unveiled during Canva Create 2025. Whether you're a teacher, instructional coach, or tech integrator, this episode is p ... Show More

This Free Tool Turns AI Prompts Into Designer-Level Sites

Want better outputs from AI? Get our free Prompt Engineering guide: https://clickhubspot.com/cnj Episode 69: Why do so many AI-generated websites end up looking generic—and how can you use “vibe coding” to stand out with truly designer-level style? Nathan Lands (https://x.com/Nat ... Show More

How We're Using AI to Dominate YouTube and X in 2024

Episode 25: How can AI transform your personal productivity and growth on platforms like YouTube and Twitter? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) delve into this with vibrant dialogue and invaluable insights from their experiences. This ... Show More

Build a Website Using Vibe Coding in 45 Min (GPT-4 & V0)

Episode 52: How has the landscape of AI coding transformed in just a few months? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) are back with Riley Brown (https://x.com/rileybrown_ai), a leading figure in the vibe coding movement. Riley is known f ... Show More

You’re Using the Smartest GPT Model Wrong (GPT o1 Full Tutorial)

Episode 42: Are you truly unlocking the full potential of OpenAI's 01 models? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive deep into the capabilities of ChatGPT01 and GPT01 Pro, offering insights to ensure you're not overlooking these power ... Show More

AI Voice Technology Just Got INSANE (ElevenLabs GenFM Demo + More)

Episode 38: How revolutionary is the latest in AI voice technology? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive deep into this topic with Ammaar Reshi (https://x.com/ammaar), head of design at ElevenLabs and AI enthusiast who has made wave ... Show More

Looking under the hood of multimodal AI

<p><a href="https://www.techtarget.com/searchenterpriseai/definition/multimodal-AI" target="_blank">Multimodal AI</a> combines different modalities—audio, video, text, etc.—to enable more humanlike engagement and higher-quality responses from the AI model. </p><p><a href="https:/ ... Show More

462 - Faruk Heplevent - The Scope

In this episode, Allan McKay sits down with Faruk Heplevent, founder and CEO of Scope Studio, to delve into the intricate world of automotive visual effects and the evolution of CG in the automotive industry. They explore Faruk's journey from his early days as a car photographer ... Show More

Listen to millions of songs and podcasts on Anghami