Lip Sync | VRC Avatars for Dummies
Lip sync makes an avatar's mouth respond while the wearer speaks. In VRChat, it is configured on the Avatar Descriptor and can use viseme blend shapes, a jaw bone, a jaw blend shape, or a parameter-only setup for advanced creators.
Start with the mode your model actually supports. A model with proper mouth blend shapes should usually use visemes. A simple model with only a jaw control may need a jaw flap mode.
Creator: DedZedOffishal
Use the video for the Unity walkthrough, then use the checks below to pick the right mode and troubleshoot the live result.
Choose the lip sync mode from the model's actual mouth setup, not from guesswork.
- Inspect the model for viseme blend shapes, a jaw blend shape, or a jaw bone.
- Set the matching LipSync mode on the Avatar Descriptor.
- Upload privately and test with real microphone input in VRChat.
VRChat's first-avatar guide lists five descriptor lip sync modes: Default, Jaw Flap Bone, Viseme Blend Shape, Jaw Flap Blend Shape, and Viseme Parameter Only. Viseme Blend Shape is the common choice when the model includes proper visemes.
Lip Sync Modes
| Mode | Use it when | Check first |
|---|---|---|
| Default | you want the SDK to try auto detection | Still inspect the result after it changes mode. |
| Viseme Blend Shape | the face mesh has mouth shapes for speech | Assign the correct viseme blend shapes in order. |
| Jaw Flap Bone | the jaw opens through a rigged bone | The jaw bone should be configured correctly in the humanoid rig. |
| Jaw Flap Blend Shape | the model has one simple open-mouth shape | The shape should only affect the intended mouth movement. |
| Viseme Parameter Only | an advanced animator setup will handle speech behavior | Use VRChat's built-in Viseme parameter intentionally. |
If you are new to avatar setup, start with Viseme Blend Shape when the model supports it. It gives clearer speech motion than a single jaw flap.
Before You Start
- A working avatar upload path.
- A face mesh with blend shapes or a jaw setup.
- A visible Avatar Descriptor on the avatar root.
- A microphone available for testing in VRChat.
- No red Console errors.
For a Mixamo or generic humanoid model, lip sync may require extra face shapes before VRChat can produce detailed mouth movement. A rigged body does not automatically mean the face has visemes.
Viseme Checklist
VRChat uses Oculus-style viseme output. The built-in Viseme animator parameter reports a viseme index from 0 to 14, and viseme blendshape setups map those mouth sounds to model blend shapes.
Check:
- the face mesh actually contains mouth blend shapes
- the Avatar Descriptor points at the correct face mesh
- the viseme fields are assigned to matching shapes
- the silent/rest mouth shape is present
- the mouth movement does not also trigger unrelated expressions
- the face mesh still renders correctly after upload
VRChat notes that empty blend shapes can be removed by Unity on import. If a required mouth shape is missing, check the model import and source file before rebuilding the descriptor repeatedly.
Testing In VRChat
Test with the avatar uploaded privately:
- Wear the avatar.
- Confirm your microphone works in VRChat.
- Speak normally and watch the mouth in a mirror.
- Try quiet and loud speech.
- Check whether facial gestures or expression menus override the mouth.
- Ask another user to confirm the motion is visible remotely if needed.
Unity preview can catch assignment mistakes, but real microphone behavior is easiest to judge in VRChat.
Common Problems
Help! The mouth does not move.
Check the Avatar Descriptor lip sync mode, assigned mesh, assigned visemes or jaw control, and whether your microphone input is working in VRChat.
Help! The wrong mouth shapes trigger.
Review the viseme assignments. A swapped shape can make speech look strange even when the descriptor is technically working.
Help! Facial expressions break lip sync.
Check expression animations and FX layers that animate the same mouth blend shapes. If a gesture holds the mouth closed, lip sync may not be able to show clearly.
Help! The avatar is too expensive after face setup.
Review blend shape usage and mesh organization. VRChat notes that blend shape cost depends on the mesh they affect, so a separate head or face mesh can be easier to optimize than broad body-wide blend shapes.
Official References
- VRChat Creating Your First Avatar
- VRChat Animator Parameters
- VRChat Avatars Documentation
- VRChat Avatar Performance Ranks
The lip-sync mode names come from VRChat's avatar descriptor workflow. Model-specific blend shape names and jaw setup still need to be checked on the actual avatar asset before assigning descriptor fields.
Related Resources
- VRChat Creating Your First Avatar
- VRChat Animator Parameters
- VRChat Avatars
- Using a Mixamo Character
- Avatar Upload Workflow
- Avatar Optimization Checklist
Final Advice
Lip sync is one of the avatar features people notice immediately. Match the descriptor mode to the model, test with a real microphone, and keep mouth blend shapes from fighting your expression animations.
Topics: Lip Sync, VRChat avatars, avatar workflow
Make Speech Readable Before Polishing Expressions
Good lip sync depends on the descriptor setup, viseme or jaw controls, live microphone testing, and the overall avatar performance budget.
Suggested Order
- Inspect the model first Check whether the avatar has viseme blend shapes, a jaw blend shape, or a jaw bone before choosing a LipSync mode.
- Test with real voice input Private upload and speak in VRChat so you can see live microphone behavior instead of only previewing blend shapes.
- Keep the avatar budget visible Lip sync is one part of the avatar. Materials, meshes, PhysBones, contacts, and mobile limits still decide whether others can see it comfortably.
Related VRCreators Guides
Common Questions
Which VRChat lip sync mode should beginners use?
Use Viseme Blend Shape when the model includes proper viseme blend shapes. Use a jaw option only when the model is built around jaw movement instead.
Why does lip sync look fine in Unity but bad in VRChat?
Unity preview does not fully replace live testing. Check the Avatar Descriptor, assigned shapes, microphone input, upload version, and actual in-client behavior.