Today we're diving deep into ComfyUI with four breakthrough papers that give you promising control over AI video generation. From generating temporally consistent normals for relighting to drawing custom motion trajectories to animate anything, these tools represent a leap forward in art direction.
π¦ NormalCrafter: Temporally stable surface normals
Surface normals are essential for relighting and can serve as powerful control signals for video generation. But until now, extracting them from video meant dealing with temporal flickering. NormalCrafter solves this elegantly.
The secret is Semantic Feature Regularization (SFR). Instead of just looking at surface textures, the model aligns its understanding with semantic representations from a DINO encoder. This forces it to focus on actual object structure, not just pixel patterns.
Their two-stage training approach is clever: first, they train the entire U-Net in latent space using diffusion score matching and SFR loss to establish basic normal estimation. Then, they fine-tune only the spatial layers in pixel space with an angular loss that directly measures normal accuracy. The result? Spatial precision with temporal consistency.
ComfyUI implementation tips:
The wrapper from AIWarper makes this accessible in ComfyUI. Key parameters to remember:
- window_size: Controls GPU memory usage (default 14 frames)
- time_step_size: Frame advancement per batch (keep it below window_size for overlap)
- Detail Transfer node: Brings back high-frequency details using the original footage
Pro tip: For best results, keep an overlap of at least 4 frames between windows. This ensures smooth transitions between batches.
πΈ Any-to-Bokeh: One-step professional depth effects
Creating realistic bokeh traditionally requires either capturing with the right lens or complex post-processing. Any-to-Bokeh changes the game by adding professional depth-of-field to any video in a single step.
The innovation is Multi-Plane Images (MPI). Think of it as slicing your scene into layers at different depths. Each layer represents objects at a specific distance, giving the model explicit geometric guidance about blur amounts.
Built on Stable Video Diffusion, they add custom MPI spatial blocks that process different depth regions separately. Cross-attention preserves important visual details while creating smooth, consistent bokeh across frames.
You get two main controls:
- Focus point selection - Choose what stays sharp
- Blur strength - Simulate different apertures
Both can be animated for professional rack focus effects. The code isn't available yet, but they promise a release soon with pre-trained weights.
π¬ Uni3C: Unified framework for camera and human motion
Unlike previous methods that handled camera and human motion separately, Uni3C unifies the process, relying on point clouds for precise 3D control.
The PCDController is the star here - a lightweight, plug-and-play module that doesn't modify your base video model. It extracts 3D scene representations from monocular depth (like judging distance with one eye closed) and uses these point clouds to guide generation.
Key workflow insights:
In ComfyUI, you can use any video as camera motion input, but we found the interactive 3D viewport particularly useful:
- Load a simple 3D object (like a cube)
- Record camera movements in real-time
- The motion transfers to your generated video
Critical settings:
- strength: Controls influence (1-3 range, higher values can cause artifacts)
- start/end_percent: Determines when in the diffusion process to apply control
- Early stages (0-0.3) affect large-scale motion
- Later stages affect fine details
In our tutorial, when water stopped flowing at higher strength values, reducing end_percent to 0.3 and adding prompts like "fast flowing river" brought the motion back while preserving camera control.
βοΈ ATI: Draw any trajectory, create any motion
ATI (Any Trajectory Instruction) from ByteDance Research is perhaps the most intuitive of the four. Simply draw where things should go, and the model creates realistic motion following your paths.
The Gaussian-based motion injector encodes your drawn trajectories as spatial distributions in latent space. It's lightweight, plugging into any pre-trained image-to-video model without retraining the backbone.
Practical parameter guide:
Temperature (0-1000): Controls motion field focus
- Low values (20-150): Broad, diffuse motion - good for camera moves
- High values (600-950): Sharp, focused motion - better for individual objects
TopK: Number of motion influences to consider
- 1: Each point affected by nearest trajectory only
- 2+: Allows blending between multiple motion fields
Start/End Percent: Diffusion stage control
- Full range (0-1): Maximum trajectory adherence
- Reduced (0-0.2): More natural motion, less strict following
Advanced techniques:
For delayed motion (objects starting to move mid-sequence), we show how to prepend static coordinates:
- Copy the starting position from the Spline Editor
- Repeat it for desired static frames
- Concatenate with the motion trajectory
The trickiest part? Collision physics. For the pΓ©tanque demonstration in our tutorial, success required:
- Increasing steps from 5 to 15
- Careful trajectory endpoint placement (avoid interpenetration)
- Higher temperature (850) for precise ball control
- Strategic pinning/unpinning of static objects
π Production insights
After extensive testing, here are the key takeaways:
Performance optimization:
- Use CausVid LoRA for rapid iteration (5 steps)
- Increase to 15-30+ steps for final quality
- Enable torch.compile and BlockSwap where compatible
Motion complexity:
- Faster motion requires more sampling steps
- Always test with multiple seeds early
- Prompts significantly help guide physics
Workflow philosophy:
- Start broad (camera motion), then refine (object motion)
- Layer controls gradually
- Document your settings - small changes have big impacts
- Learn when to constrain the model tightly and when to let it breathe. Master that balance, and you'll create magic.
π― Looking forward
These four papers demonstrate how quickly research transforms into practical tools. Just weeks ago, these were academic concepts. Today, they're ComfyUI nodes you can use in production.
Remember: these tools are just an addition to your toolbox. There are great use cases for them, but they are not a one-size-fits-all solution. We're still early in the research & development process. The art is in knowing when to complement your traditional VFX workflow with some Machine Learning spices.
All workflows and assets are available on our GitHub. Detailed instructions are in the video tutorial on YouTube. Experiment, share your results, and push these tools beyond their intended limits. That's how we collectively advance this field.
Be kind, be creative, be curious, be human. See you next week for another episode of AInVFX news!
π Sources & Links
π Research Papers:
- NormalCrafter Project
- NormalCrafter Paper
- NormalCrafter GitHub
- Any-to-Bokeh Project
- Any-to-Bokeh Paper
- Uni3C Project
- Uni3C Paper
- Uni3C GitHub
- ATI Project
- ATI Paper
- ATI GitHub