- by John Luxford
By John Luxford, CTO & Co-founder - Flipside
This is part 3 of our blog post series about acting in VR and working with actors in virtual environments. Here are the two previous posts:
Now that we’ve explored some general lessons learned as well as lessons by actors for actors wanting to act in VR, here are some of the more technical discoveries we've made that can have a big impact on the quality of your final output.
Actors need to respond quickly to verbal and physical cues from the other actors present, as well as to changes in the environment. This is not a problem in the real world because there is no latency between the actors who are present in the same physical space, but actors over multiplayer are always seeing each others' actions from the past. This is the latency between them.
In a virtual space, latency is impossible to avoid. Even the time it takes for an action taken by the actor to be shown in their own VR headset can be upwards of 20 milliseconds. Remote actors will see each other's actions with latencies of 100 milliseconds or greater, even over short distance peer-to-peer connections.
Depending on the distance and connection quality, that can be as much as half a second or more, in which case reaction times are simply too slow. Past the 100 millisecond mark, actor-to-actor response times can degrade quickly, making the reaction to a joke fall flat, or creating awkward pauses similar to those you see on a slow Skype connection. For this reason, a virtual studio needs to be designed to keep latency to an absolute minimum.
Fortunately, a virtual studio doesn't have a lot of the same requirements video games have that make peer-to-peer connections disadvantageous. For example, the number of peers is going to be relatively low, and you don't need to protect against cheating, or waiting for the slowest player to catch up before achieving consensus on the next action in the game. So for VR over shorter distances, peer-to-peer is a better option than to use a server in the middle (although a server can often decrease latency over greater distances because of the faster connections between data centers).
Buffering needs to be minimized as much as possible too. Minimal buffers also mean the system can't smooth over network hiccups as easily, so a stable and fast network connection is needed at both ends.
A great way to keep latency to a minimum is to make sure the actors are physically located close together, preferably connected via Ethernet to the same network.
If you're recording a show with multiple actors in the same physical space, soundproofing between them becomes critical because the microphones in each VR headset can pick up the other actors's voices, causing issues with lip syncing where the lips move when an actor isn't speaking, or even hearing one actor faintly coming out of the other actor's mouth.
Even hearing feet on the ground, or the clicks from the older Oculus Touch engineering samples, can be picked up and become audible, or cause the character's lips to twitch. Wearing socks and using the consumer edition of the Oculus Touch controllers can make a big difference.
In-ear earphones are also key for ensuring voices don't bleed through from the earphones into the microphone of the wrong actor as well.
On the simplest level, this means adjusting the VR headset microphone levels in the system settings so that the voices at their loudest aren't clipping (e.g., causing audio distortion). It also means getting the audio mix right between the actors, the music, and other sound effects.
Clipping in a digital audio signal.
For traditional 2D output, a spatialized audio mix is not ideal either, since that means the mix will be relative to the position and direction of the local actor's head in the scene. For this reason, a stereo mix is important if you're recording for 2D viewers, but with Flipside we built a way of replaying the spatialized version in VR while outputting to stereo while recording.
Another challenge is that VoIP quality voice recording is substandard for recorded shows, by about half. Because higher frequency sound waves move faster than lower ones, a 16kHz sample rate is too slow to capture the higher frequencies of an actor's voice, losing detail and leaving them sounding muffled.
This ceiling where voice stops being captured properly is around 7.2kHz, but to capture the full frequency range of a voice you want to capture everything up to 12kHz, or even higher. But this is a trade-off between quality and the size of audio data being sent between actors. If the data is too large, it can slow things down, adding to the latency problem.
There are pros and cons to both platforms, and while both are amazing platforms in their own right, which one is anyone's favourite usually comes down to personal preference.
That said, the Oculus Rift with Touch Controllers has certain advantages and the HTC Vive has other advantages too, for the purposes of acting.
On the Oculus Rift, the microphone generally sounds better, and the Oculus Touch controllers offer more expressiveness in the hands, as well as additional buttons which allow for control of things like a teleprompter or slideshow in the scene. We've found the Oculus Touch's joystick easier for precision control of facial expressions than the thumb pad on the HTC Vive controllers, and the middle finger button easier to grab with than the Vive's grip buttons.
On the other hand, the HTC Vive's larger tracking volume is much more ideal for actors looking to move around, although a 3-sensor setup can easily achieve a sufficient tracking volume for Oculus Rift users. The Vive also wins on cord length from the PC to the headset, and the Vive trackers are awesome for doing full body motion capture!
After working with professional actors in Flipside Studio for the past few months, it really opened our eyes to the subtle balance needed to provide an environment they feel not just comfortable acting in, but inspired to be in too.
We're glad we could share what we've learned with you, and as we continue seeking out new actors for our Flipside Studio early access program, we hope these lessons will inform creators and help you create better content, faster.