Four months ago, we released an update of SeedVR2 integration for ComfyUI. You pushed it to its limits, broke it in ways we never imagined, and helped us rebuild it better. Today's v2.5 release isn't just an updateβit's what happens when a community comes together to make professional video upscaling truly accessible.
π The breaking changes we needed
Version 2.5 requires recreating your workflows. Here's why every change was necessary:
The monolithic architecture hit fundamental limits. Running these models on consumer GPUs was difficult and painful. Memory leaks compounded over long videos. Alpha channels required workarounds.
The new 4-node system solves these issues at their root:
The modular architecture
SeedVR2 Load DiT Model: Controls the upscaling transformer
- GGUF model support with Q4_K_M and Q8_0 quantization
- Enhanced BlockSwap with adaptive memory clearing
- Separate I/O component offloading
- Model caching across instances
SeedVR2 Load VAE Model: Manages encoding/decoding
- Independent tiling for encode and decode operations
- Tensor offload support for accumulation buffers
- Optimized for different tile size requirements
SeedVR2 Torch Compile Settings: Optional speed optimization
- 20-40% DiT speedup with full graph compilation
- 15-25% VAE acceleration
- Configurable modes from default to max-autotune
SeedVR2 Video Upscaler: The main processing node
- Native RGBA support with edge-guided upscaling
- Smart resolution limiting with max_resolution
- Enhanced batch processing with uniform_batch_size
- Deterministic generation with phase-specific seeding
πΎ GGUF: The game-changer for accessibility
GGUF quantization transforms what's possible on consumer hardware. The 7B model that required 24GB VRAM now runs on 8GB GPUs.
The implementation required solving unique challenges:
- Fixed VRAM leaks in GGUF layers
- Made torch.compile compatible with quantized operations
- Resolved non-persistent buffer issues
- Optimized dequantization paths
β‘ torch.compile: When compilation beats interpretation
PyTorch 2.0's torch.compile transforms Python functions into optimized CUDA kernels. But knowing when to use it matters:
The compilation trade-off
First run compilation takes 2-5 minutes. Subsequent runs are 20-40% faster. The math is simple:
- Single image: Don't compile
- 10-second video: Consider it
- Batch processing: Always compile
π§ Memory management reimagined
The old architecture loaded everything, processed everything, then hoped for the best. v2.5 implements a 4-phase pipeline that completes each phase for all batches before moving forward:
- Encode phase: VAE encoding with optional tiling
- Upscale phase: DiT processing with BlockSwap
- Decode phase: VAE decoding with separate tiling
- Postprocess phase: Color correction and format conversion
Each phase clears its resources completely. No accumulation. No leaks.
The batch_size formula
The critical requirement: batch_size must follow 4n+1 (1, 5, 9, 13, 17, 21...). This is how the model maintains temporal consistency between frames.
For a 120-frame shot:
- batch_size=5: Works on most GPUs, 24 separate batches
- batch_size=21: Better temporal consistency, needs more VRAM
- batch_size=121: Single batch processing, best quality if you have 24GB+
π CLI for production pipelines
The enhanced CLI now handles batch processing intelligently:
python inference_cli.py media_folder/ \
--output processed/ \
--cuda_device 0 \
--cache_dit \
--cache_vae \
--dit_offload_device cpu \
--vae_offload_device cpu \
--resolution 1080 \
--max_resolution 1920 \
--batch_size 21 \
--blocks_to_swap 16
Key improvements:
- Models stay cached between files
- Automatic format detection (MP4/PNG)
π€ Community contributions
This release exists because of you. Special thanks to benjaminherb, cmeka, JohnAlcatraz, lihaoyun6, Luchuanzhao, Luke2642, naxci1, q5sys, FurkanGozukara, and the 100+ contributors who shaped v2.5. To all of you, THANK YOU.
π― The bottom line
SeedVR2 v2.5 makes professional video upscaling accessible. Whether you're running a H100 or laptop with 8GB VRAM, you can now upscale to 4K.
The breaking changes were worth it. The architecture is cleaner, the memory management and quality have improved. Most importantly, it's yours to use, modify, and build upon.