Speed Up Video Generation 2-3x: MagCache, NAG, DLoRAL & AI Art Restoration

June 21, 2025

Speed matters in production. Today we're exploring four practical techniques that solve real challenges: MagCache for 2-3x faster generation, NAG for negative prompting in distilled models, DLoRAL for one-step upscaling, and MIT's approach to art restoration that's 66x faster than manual methods.

⚡ MagCache: Intelligent acceleration without complexity

Video diffusion models are slow because they typically run 30-50 denoising steps. With large models like Wan2.1 14B, every step requires massive computation. Previous acceleration methods like TeaCache analyze 70 test prompts and use polynomial fitting to predict which steps to skip—but this fails when your prompt differs from the calibration set.

MagCache takes a different approach based on a universal magnitude law. The researchers discovered that the magnitude ratio between consecutive denoising steps follows a consistent pattern: it decreases slowly for about 80% of the process, then drops rapidly in the final steps. This pattern holds across different prompts and even different models.

How MagCache works:

Instead of complex calibration, MagCache uses pre-calculated magnitude ratios embedded in the code. It tracks accumulated error as it skips steps:

The accumulated ratio starts at 1 and multiplies by each step's magnitude ratio
Error is calculated as how far this ratio drifts from 1
Two parameters control skipping: threshold (error tolerance) and K (max consecutive skips)

ComfyUI implementation:

Both the official MagCache node and Kijai's WanVideoWrapper support this technique. Key settings:

threshold: 0.02 (good balance of speed/quality)
K: 6 (allows up to 6 consecutive skipped steps)
start_step: 5 (preserves critical early steps)

In our tests, MagCache achieved 2x speedup (920s → 424s) with minimal quality difference. The official implementation adds torch.compile for additional acceleration.

Limitations:

Requires high step counts (20+) to be effective
Won't help with already-distilled models like CausVid
Pushing parameters too hard causes quality loss

🎯 NAG: Bringing back negative prompts

When you use distilled models like Self-Forcing or CausVid that generate in just 4-8 steps, you typically have to set CFG (Classifier-Free Guidance) to 1. This disables negative prompting entirely. Why? Because CFG runs the model twice per step—once with your prompt, once without—then blends the outputs. In just 4 steps, these branches diverge too dramatically to blend properly.

NAG (Normalized Attention Guidance) solves this by operating in attention space rather than output space. While CFG blends the final outputs of each step, NAG intervenes during the attention computation itself.

The NAG process:

Computes positive features (Z+) and negative features (Z-)
Calculates the direction away from negatives toward positives
Applies L1 normalization as a safety limit
Blends adjusted features with original attention

All this happens in a single forward pass—no doubling of computation like CFG.

ComfyUI workflow:

Using the Apply NAG node with WanVideoWrapper:

nag_scale: 9-11 (strength of negative prompt effect)
nag_tau: 2.5 (normalization temperature)
nag_alpha: 0.25 (blending factor)

In our demonstration, we successfully changed a red bird to blue using negative prompting at CFG=1 with Self-Forcing.

🎬 DLoRAL: Rethinking video upscaling

Video upscaling faces a fundamental conflict: you want sharp frames but also smooth motion. Current methods struggle with this balance—add details and you get flickering, maintain consistency and you get blur.

DLoRAL from Hong Kong Polytechnic University separates these tasks into two specialized LoRA modules:

C-LoRA: Handles temporal coherence
D-LoRA: Manages spatial enhancement

Each optimizes its task without compromising the other.

Key innovations:

Cross Frame Retrieval (CFR): Instead of processing frames in isolation, it examines neighboring frames for context. If frame 2 is blurry but frames 1 and 3 are clear, CFR uses that information to guide enhancement.

Alternating training: First C-LoRA learns temporal stability, then it freezes while D-LoRA adds details respecting the established structure.

One-step inference: While Upscale-A-Video needs 30 steps and MGLD needs 50, DLoRAL maps directly from low to high quality in a single step—achieving 10x speedup.

Results:

Top perceptual quality scores on VideoLQ benchmark
Maintains temporal consistency matching existing methods
Processes 50 frames at 512x512 in ~5 minutes on A100

The main limitation is the inherited 8x VAE downsampling from Stable Diffusion, which can affect very fine details like small text.

🎨 MIT's AI-powered art restoration

MIT graduate student Alex Kachkine developed a technique combining AI analysis with physical application. The problem: 70% of paintings in institutional collections are locked away, many too damaged to display and too expensive to restore traditionally.

The process:

Scan and analyze: AI identifies all damaged regions (5,612 areas in the demo painting)
Digital restoration: Creates a map using 57,314 different colors
Physical application: Prints restoration on polymer films in two layers (white backing + color)
Apply to painting: Adheres with conservation-grade varnish

The result: 3.5 hours instead of weeks, estimated 66x faster than manual inpainting.

Practical considerations:

Only works on smooth, varnished paintings
Completely reversible with standard solvents
Designed to last ~100 years in proper conditions
Best suited for lower-value works that would otherwise stay in storage

This demonstrates AI's potential beyond content creation—helping preserve what we already have at a scale previously impossible.

🚀 Implementation insights

After testing, here are key takeaways for production:

MagCache optimization:

Start with conservative settings (threshold=0.02)
Skip early steps (start_step≥5)
Monitor for color shifts or detail loss

NAG fine-tuning:

Works with both distilled and standard models
Can combine with CFG for additional control
Adds minimal computational overhead (~12% to attention layers)

Workflow integration:

Use MagCache for standard high-step workflows
Apply NAG when you need negative prompting with fast models or further control

All workflows and detailed parameters are available on our GitHub. These aren't universal solutions—they're tools to optimize specific scenarios. The key is knowing when each technique provides value for your particular pipeline.

Be kind, be creative, be curious, be human. See you next week for another episode of AInVFX news!

🔗 Sources & Links

🔧 ComfyUI Workflows:

Complete ComfyUI workflows

📚 Research Papers:

MagCache:

NAG (Normalized Attention Guidance):

DLoRAL:

AI Art Restoration: