One-step 4K video upscaling and beyond for free in ComfyUI with SeedVR2

July 11, 2025

Video upscaling has always been a compromise between speed, quality, and hardware requirements. Traditional diffusion models need 15-50 steps to transform low-quality footage into high resolution. ByteDance's SeedVR2 changes this equation entirely - achieving high-quality restoration in just one step.

🚀 The one-step breakthrough

Most video restoration solutions face a fundamental challenge: they're either fast but flicker, require massive GPUs, or are locked behind closed-source models and paid licenses. SeedVR2, released under Apache 2.0 license, solves this with Diffusion Adversarial Post-Training (APT).

The innovation combines the reliability of diffusion models with the efficiency of GANs. Starting from SeedVR (their pre-trained diffusion model), ByteDance applies adversarial training to create what they call the largest-ever video restoration GAN at 16 billion parameters.

How APT works

The process unfolds in two stages:

Progressive distillation: A teacher model shows a student how to compress 64 steps to 32, then to 16, 8, and finally 1 - like teaching an artist to capture a portrait in a single brushstroke.
Real data training: Unlike traditional distillation that's limited by the teacher's quality, APT trains on real high-resolution videos. The model learns to restore degraded footage directly, allowing it to surpass its teacher.

🏗️ Architecture that scales

Under the hood, SeedVR2 uses a Swin Transformer (Shifted Window Transformer) architecture. Traditional patch-based methods require up to 50% overlap between tiles to avoid visible seams. Swin's adaptive window attention processes entire frames while dynamically adjusting to your target resolution.

The model includes several mathematical safeguards:

RpGAN loss prevents repetitive outputs
R1/R2 regularization keeps the discriminator balanced
Feature matching loss measures quality in latent space for efficiency

💻 Running on consumer hardware

The reality check: even the 3B model demands more than 16GB VRAM. That's where our BlockSwap implementation comes in.

Understanding BlockSwap

Think of transformer blocks like floors in a skyscraper. The 3B model has 32 floors, the 7B has 36. Instead of keeping the entire building in GPU memory, BlockSwap keeps only what's actively needed, storing the rest in CPU RAM.

Key parameters:

blocks_to_swap: How many blocks to offload (0-32 for 3B, 0-36 for 7B)
use_non_blocking: Enables asynchronous CPU-GPU transfers
offload_io_components: Saves additional VRAM by offloading input/output embeddings
cache_model: Keeps model in RAM between generations

Optimization strategy

Start conservatively:

Set blocks_to_swap to 16
Run generation
If out of memory, increase incrementally
Enable offload_io_components only if needed
Each swapped block adds overhead, so use the minimum necessary

🎯 Practical workflows

Basic video upscaling

Model: 7B FP16 for best quality
Batch size: As high as possible (must be 4n+1)
preserve_vram: True for consumer GPUs
BlockSwap: Start with 16 blocks and increase until you have enough VRAM to run the generation

Alpha channel preservation

For VFX pipelines with image sequences and alpha:

Load image sequence with alpha
Process RGB and alpha separately through SeedVR2
Merge using Join Image with Alpha
Export as PNG16 or EXR sequences with the CocoTools_IO

Resolution control

SeedVR2 can oversharpen, especially on AI-generated content. Control this through stepped upscaling:

2x with bilinear filtering for softer results
4x with Lanczos for maximum sharpness
Combine SeedVR2 resolution with traditional upscaling for fine control

⚡ Performance insights

The good:

Single-step inference is 15-50x faster than traditional methods
Temporal consistency without special handling
Excellent on degraded or compressed footage

The challenges:

VAE encoding/decoding accounts for 95% of processing time
High VRAM requirements even with optimization
Oversharpening on clean content
CFG scale currently disabled (fix pending)

Multi-GPU scaling

For production pipelines, NumZ's command-line tool distributes frames across GPUs:

4 GPUs processing 1000 frames = 250 frames each in parallel
Near-linear scaling for large batches

🚀 Looking forward

SeedVR2 represents a paradigm shift—not just in speed, but in accessibility. With NumZ's ComfyUI integration and our BlockSwap optimization, production-quality upscaling is no longer limited to closed-source solutions and studios with render farms.

Remember: like any tool, SeedVR2 has its place in your pipeline. The key is knowing when one-step restoration serves your creative vision and when you need more control. Master that balance, and you'll unlock new possibilities for your projects.

One-step 4K video upscaling and beyond for free in ComfyUI with SeedVR2

🚀 The one-step breakthrough

How APT works

🏗️ Architecture that scales

💻 Running on consumer hardware

Understanding BlockSwap

Optimization strategy

🎯 Practical workflows

Basic video upscaling

Alpha channel preservation

Resolution control

⚡ Performance insights

Multi-GPU scaling

🚀 Looking forward

🔗 Sources & Links

🔧 ComfyUI Workflows:

Research:

ComfyUI Tools:

References:

Join the conversation

Let's work together