Viggle V4 Guide: Everything you need about motion control is here

PRODUCT2026-02-12Viggle Team4 MIN READ

Viggle V4 is Viggle’s latest model, powered by our proprietary JST, a true 3D world model. V4 allows you to seamlessly transfer motion from a reference video to a character image you upload. We’ve significantly upgraded overall performance in V4, including stronger support for complex motion, improved character consistency across shots, richer textures and details, and more expressive, natural facial expressions. V4 also introduces new paired features and parameters—Character Refine, Smooth Motion, and Foot Lock—giving you more control and more stable results. Let’s take a look of everything and snap how you get the best results in V4.

Capability on motion control#

V4 pushes motion capture to new heights. Complex movements are captured with precision and generated with full control, including high-speed actions and technically demanding motion. What once required expensive motion-capture systems can now be achieved with a single reference video in Viggle.

Compared with other AI video generation models, motion control remains Viggle’s strongest and most differentiated capability. Compared to our previous models, V4 significantly improves the understanding of complex and high-speed motion, delivering more stable results across flips, inversions, head-down actions, and gymnastics-style movements. Shakiness is greatly reduced, even when characters appear near frame edges or are partially occluded by other objects.

Unlike diffusion-based video generation models that operate primarily at the frame level, Viggle’s underlying 3D world model provides a structural understanding of motion, resulting in more physically coherent, stable, and controllable movement.

Examples:

Character consistency#

Character consistency remains one of the most difficult challenges in AI video generation. Many models struggle to maintain a stable identity as motion becomes faster, camera perspectives change, or scenes grow more complex.

By combining advanced motion understanding with a persistent 3D character representation, V4 preserves character proportions, outfit details, and visual identity throughout the entire video. In earlier versions, non-humanoid characters often tended to drift toward generic humanoid forms during generation. With V4, this behavior is significantly improved. Animals, aliens, robotic units, armored figures, and plush characters now retain their original structure, surface materials, and overall silhouette, staying faithful to the source image.

The impact is most visible in long clips, fast movement, and multi-angle shots. Characters remain stable and recognizable during rotations, jumps, and partial occlusion, effectively eliminating the “character drift” commonly seen in other models. Plus, no re-rolling generation in all circumstances.

Examples:

Multi-person scenes#

V4 delivers reliable performance in multi-person scenes—an area where most AI video generation models struggle. In shots with multiple characters, Viggle preserves identity, timing, and spatial relationships without blending or confusion.

With Viggle’s multi-track feature, creators can track scenes up to one minute long, accurately select specific characters, and replace them with custom characters. Throughout the process, the original background, camera movement, and character motion remain unchanged, while each character stays correctly grounded within the scene’s spatial layout. This enables precise character replacement in group choreography, dialogue scenes, and complex interactions—use cases that are typically unreliable or impossible with other video generation models.

Examples:

Facial expressions & lip sync#

mouth movements closely follow the reference video, preserving both emotional cues and spoken content.

What the character says in the source video is reflected in the generated result, with matching mouth shapes, timing, and expressions. This alignment remains stable even during complex or high-energy body movement

Examples:

Speed#

Viggle V4 is significantly faster and more powerful than other video generation models. While many AI video tools take several minutes—or even hours—to render a single clip, Viggle typically delivers results in just a few seconds to under one minute. Beyond speed, Viggle also supports much longer outputs: users can generate videos up to 1 minute or more (under 100MB), whereas most other AI video generation models are limited to a maximum length of around 15 seconds. Faster rendering and longer duration make Viggle a more practical solution for real creative workflows.

What is Character Refine?#

Character Refine is a feature designed to maximize character accuracy and identity preservation. By uploading a single character image, Viggle automatically generates 5 reference images from different angles. These images are merged into a unified character reference used throughout the generation process.

This multi-angle understanding allows V4 to recreate the character accurately across different motions, poses, and camera angles. Character Refine is especially effective for non-human, stylized, heavily designed characters, or rich textures, where maintaining consistent shapes and features is critical.

How to use V4 and Character Refine#

V4#

Under Mix, select our V4 model. Upload your video and character

Character Refine#

On the left bar, choose Character Refine, upload a character image to start. Generate images of different angles of your character, and we will refine it for you. After the refinement is done, you can use this character for video generation in the next step, and the character will stay truer to the original.