Art direct Wan2.1: NormalCrafter, Any-to-Bokeh, Uni3C & ATI deep dive

June 6, 2025

Today we're diving deep into ComfyUI with four breakthrough papers that give you promising control over AI video generation. From generating temporally consistent normals for relighting to drawing custom motion trajectories to animate anything, these tools represent a leap forward in art direction.

🔦 NormalCrafter: Temporally stable surface normals

Surface normals are essential for relighting and can serve as powerful control signals for video generation. But until now, extracting them from video meant dealing with temporal flickering. NormalCrafter solves this elegantly.

The secret is Semantic Feature Regularization (SFR). Instead of just looking at surface textures, the model aligns its understanding with semantic representations from a DINO encoder. This forces it to focus on actual object structure, not just pixel patterns.

Their two-stage training approach is clever: first, they train the entire U-Net in latent space using diffusion score matching and SFR loss to establish basic normal estimation. Then, they fine-tune only the spatial layers in pixel space with an angular loss that directly measures normal accuracy. The result? Spatial precision with temporal consistency.

ComfyUI implementation tips:

The wrapper from AIWarper makes this accessible in ComfyUI. Key parameters to remember:

window_size: Controls GPU memory usage (default 14 frames)
time_step_size: Frame advancement per batch (keep it below window_size for overlap)
Detail Transfer node: Brings back high-frequency details using the original footage

Pro tip: For best results, keep an overlap of at least 4 frames between windows. This ensures smooth transitions between batches.

📸 Any-to-Bokeh: One-step professional depth effects

Creating realistic bokeh traditionally requires either capturing with the right lens or complex post-processing. Any-to-Bokeh changes the game by adding professional depth-of-field to any video in a single step.

The innovation is Multi-Plane Images (MPI). Think of it as slicing your scene into layers at different depths. Each layer represents objects at a specific distance, giving the model explicit geometric guidance about blur amounts.

Built on Stable Video Diffusion, they add custom MPI spatial blocks that process different depth regions separately. Cross-attention preserves important visual details while creating smooth, consistent bokeh across frames.

You get two main controls:

Focus point selection - Choose what stays sharp
Blur strength - Simulate different apertures

Both can be animated for professional rack focus effects. The code isn't available yet, but they promise a release soon with pre-trained weights.

🎬 Uni3C: Unified framework for camera and human motion

Unlike previous methods that handled camera and human motion separately, Uni3C unifies the process, relying on point clouds for precise 3D control.

The PCDController is the star here - a lightweight, plug-and-play module that doesn't modify your base video model. It extracts 3D scene representations from monocular depth (like judging distance with one eye closed) and uses these point clouds to guide generation.

Key workflow insights:

In ComfyUI, you can use any video as camera motion input, but we found the interactive 3D viewport particularly useful:

Load a simple 3D object (like a cube)
Record camera movements in real-time
The motion transfers to your generated video

Critical settings:

strength: Controls influence (1-3 range, higher values can cause artifacts)
start/end_percent: Determines when in the diffusion process to apply control
- Early stages (0-0.3) affect large-scale motion
- Later stages affect fine details

In our tutorial, when water stopped flowing at higher strength values, reducing end_percent to 0.3 and adding prompts like "fast flowing river" brought the motion back while preserving camera control.

✏️ ATI: Draw any trajectory, create any motion

ATI (Any Trajectory Instruction) from ByteDance Research is perhaps the most intuitive of the four. Simply draw where things should go, and the model creates realistic motion following your paths.

The Gaussian-based motion injector encodes your drawn trajectories as spatial distributions in latent space. It's lightweight, plugging into any pre-trained image-to-video model without retraining the backbone.

Practical parameter guide:

Temperature (0-1000): Controls motion field focus

Low values (20-150): Broad, diffuse motion - good for camera moves
High values (600-950): Sharp, focused motion - better for individual objects

TopK: Number of motion influences to consider

1: Each point affected by nearest trajectory only
2+: Allows blending between multiple motion fields

Start/End Percent: Diffusion stage control

Full range (0-1): Maximum trajectory adherence
Reduced (0-0.2): More natural motion, less strict following

Advanced techniques:

For delayed motion (objects starting to move mid-sequence), we show how to prepend static coordinates:

Copy the starting position from the Spline Editor
Repeat it for desired static frames
Concatenate with the motion trajectory

The trickiest part? Collision physics. For the pétanque demonstration in our tutorial, success required:

Increasing steps from 5 to 15
Careful trajectory endpoint placement (avoid interpenetration)
Higher temperature (850) for precise ball control
Strategic pinning/unpinning of static objects

🚀 Production insights

After extensive testing, here are the key takeaways:

Performance optimization:

Use CausVid LoRA for rapid iteration (5 steps)
Increase to 15-30+ steps for final quality
Enable torch.compile and BlockSwap where compatible

Motion complexity:

Faster motion requires more sampling steps
Always test with multiple seeds early
Prompts significantly help guide physics

Workflow philosophy:

Start broad (camera motion), then refine (object motion)
Layer controls gradually
Document your settings - small changes have big impacts
Learn when to constrain the model tightly and when to let it breathe. Master that balance, and you'll create magic.

🎯 Looking forward

These four papers demonstrate how quickly research transforms into practical tools. Just weeks ago, these were academic concepts. Today, they're ComfyUI nodes you can use in production.

Remember: these tools are just an addition to your toolbox. There are great use cases for them, but they are not a one-size-fits-all solution. We're still early in the research & development process. The art is in knowing when to complement your traditional VFX workflow with some Machine Learning spices.

All workflows and assets are available on our GitHub. Detailed instructions are in the video tutorial on YouTube. Experiment, share your results, and push these tools beyond their intended limits. That's how we collectively advance this field.

Be kind, be creative, be curious, be human. See you next week for another episode of AInVFX news!

Art direct Wan2.1: NormalCrafter, Any-to-Bokeh, Uni3C & ATI deep dive

🔦 NormalCrafter: Temporally stable surface normals

ComfyUI implementation tips:

📸 Any-to-Bokeh: One-step professional depth effects

🎬 Uni3C: Unified framework for camera and human motion

Key workflow insights:

✏️ ATI: Draw any trajectory, create any motion

Practical parameter guide:

Advanced techniques:

🚀 Production insights

🎯 Looking forward

🔗 Sources & Links

📚 Research Papers:

🔧 ComfyUI Implementations:

📁 Workflows & Assets:

Join the conversation

Let's work together