Mixed Reality Studios: A Primer for Producers
Sorry folks, this one is a whopper! I thought describing the ins and outs of Mixed Reality would be a little easier but you will find that this article opens up new questions and is by no means conclusive or comprehensive.
Thanks to my co-authors Laura Frank and Thomas Kother this is really a three-in-one digest!
If you feel like you have a section covered, please jump ahead or re-visit again later.
Laura! Thank you again for starting frame:work, which inspired me to start this overview.
Further thanks is to due the disguise team in London for letting us be the first to test their new XR Stage and their great support throughout.
ps: if you would like to quote or reference images from here, please go for it! We do ask for a citation in return. If in doubt, get in touch.
In this article we will examine features, benefits and challenges in producing mixed reality (AR, VR, MR) in live and recorded events. We will look at descriptions, naming conventions and production related terms and explore different applications. Furthermore we will bias this discussion to solutions that integrate real and virtual worlds, most commonly combining human performance with virtual set extensions, rendered for at least one camera in real time. This article is a discussion for producers and designers, providing an overview of typical technical setups of mixed reality stages. While the techniques described do feature in cinematic and other high fidelity uses and offline productions, we are looking at Mixed Reality, further abbreviated as XR, from the perspective of the live entertainment industry.
In 2020 Mixed Reality has become a much discussed topic in media production. Its increased application has undoubtedly been catalysed by the Corona pandemic. Uncertainties in travel, restricted venue capacities, social distancing, and quarantine rules have globally halted live events.
Nonetheless brands, artists, and live shows want to retain a market presence, and therefore solutions for compelling visual effects and narratives in controlled environments are desirable. Mixed Reality endeavours to deliver this presence with immediacy and emotional value, while distributing the composited product via media streaming or broadcasting.
In our opinion some important factors have aligned in 2020:
- Realistic real time rendering for small to medium size canvases (roughly up to UHD level) are maturing to be considered technically and artistically viable. This is due to hardware and software improvements
- Media servers are focusing to provide unified platforms to manage rendering, tracking, spacial-, temporal- and colour- calibrations
- Producers and engineers are adapting workflows and retraining in response to new remote working conditions
- Clients are willing to consider new risks and investments into cutting edge production technologies, when the alternative is a diminished media presence
- Remote work is increasingly accepted
- Due to lockdowns and social distancing, audiences welcome streaming formats
- Media on demand and home offices require updates to data infrastructure globally (fibre and 5G)
In Mixed Reality live performers and props are filmed inside an illuminated setup of LED walls, which are mapped with a corresponding virtual environment. This environment is rendered in real time to match the perspective of a tracked camera. The actual stage and performers are lit corresponding to the virtual scenery. In this way real and virtual elements blend into a single comprehensive mise-en-scene.
Different levels of immersion of virtual productions have different names. An XR Space is a somewhat encompassing wall, while XR Stages have an added walk-on floor, and XR volumes almost entirely cocoon the production.
In XR spaces and stages, props and performers need to be conventionally lit to achieve immersion. A volume, on the other hand, can be considered to create its own light space, a condition in which a lighting rig is more auxiliary.
Typically XR stages are laid out as three sided cubes (diamond configuration) or curved screens atop a walk-on LED floor. There are variations and hybrids of these concave spaces. All of them typically open to align the floor’s diagonal as the main camera - or viewing axis (see diamond illustration above) analogue to the upstage-downstage direction. The arrangement encompasses screens, lights, cameras, performers and props.
In contrast to green screening and post production, XR workflows can achieve a true interaction of analogue and virtual elements in real time. Instead of compositing shots after the fact, all effects are presented to one camera and video stream.
Whatever type of XR set, it is a part of the real world production space. The virtual reality is a conceptual overlay to the real space. Their amalgamation is then made visible through the camera on set.
To plan the XR design, lighting and in camera result, it is useful to simulate the LED set, real props and performers in 3D. The real and virtual components of the set and scenery are becoming one creative domain.
A second domain is the compositing stack that combines into the intended overall shot. To further define this compositing domain we will explore XR layering and the related technical aspects:
Note on the first image the frontplate object can be perceived anywhere within the depth of the composition+shot, the middle ground of the overall image in this case.
The bottom layer of the compositing stack is a “backplate”. The backplate is everything that is presented to the LED walls of the XR stage. In order to render the image into LED, the camera needs to be calibrated in spatial relation to the real set. At any time the camera’s position, direction, focus and zoom are tracked and applied to a virtual camera. This virtual twin is rendering a virtual scene, comparable to the eyepoint of a player in a first person game. The resulting image of the virtual world is passed to the LED walls.
The XR stage can display as much of the virtual camera view as is respectively allowed by the configuration of the LED walls. (image- pixels falling on and off the set, wide shot close shot).
Since it is likely that a wide camera shot will pick up more of the virtual scene than is covered by the screens, the system can render a “set extension” layer. This is the part of the backplate that can physically not be displayed in the LEDs. Typically the set extension layer is composed above the backplate and ideally contains a feather and choke matte to blend well with the lower layer.
The real world performers and props are picked up by the camera along with the backplate. Therefore the real world scenery is presented correctly as long as it is within the LED wall’s bounds. Once a scene element is outside these bounds, while the camera has a wider view, any element is overlaid by the set extension layer, and thus occluded.
The backplate, camera pickup and set extension layers form the base of the XR composite.
The top XR-layer in the stack is the frontplate. The front-plate is using the same camera information as the backplate. The cameras in virtual space are effectively alike, but differ in their depth rendering, or pick up virtual scenery by render tags. These tags can encode “behind” or “in-front” information per virtual scene item, and can be dynamically changed or keyframed. Although it is a simplification, it helps to consider that the front-plate camera renders from the camera focus plane towards the lens.
"If a system is overlaying a tracked virtual render on top of a full frame camera capture only, it is commonly called augmented reality (AR). A frontplate layer does however not necessarily need to be in front of a presenter or prop in the perceived depth of a shot."
Additional layers can be brought into the stack as screen, textures, frames and similar types within the virtual environment. This can be presentations, graphic, videos as well as live streams. These layers can appear in front- and back-plate, a real prop within the XR- stage or as a graphic overlay such as lower thirds, tickers and logos.
Up to this point we have looked at the combination of real + virtual space and the composition stack for a single point of view. Now we need to explore key components and technical integration.
Most XR stages have multiple cameras. These are necessary in live events, as they rely on calling cameras to direct attention and narrative in real time, while XR for film and TV can be edited in post, and therefore put more focus on fewer cameras in benefit of higher quality, resolution and colour reproduction.
The first essential technical element for XR is tracking. Camera tracking systems can work in different ways. To point out a few: encoder-based (Mo-Sys), wifi with infrared active beacons (Vive), markerless (N-Cam), IR-camera sensors of passive tracking points (Stype, N-Cam).
In XR the latter are most commonly used, although bigger productions will inevitably combine systems. The individual benefits lie in the application. In camera considerations, lighting conditions as well as set design determine the best match.
The tracking system provides hard and software to calibrate the stage. The scanned real space provides the transformations to subsequently align and scale the virtual space, for both to form a unit. The camera information is pooled and transmitted to the media server and render engines. The engine’s virtual camera receives the tracking information, and because of the alignment, the perspective within the render engine matches the shot defined by the real world camera.
Real time rendering is the second prerequisite for XR. Rendering engines host the virtual scenery and are designed to provide real time interactivity, lighting and dynamics. While Unity and Unreal are coming from a games background, Notch have been in this field for art and events.
The design process is similar in all of the engines, and it requires creative teams to consider computer performance, event-logic and hardware integration. The essential interactivity is the freedom of the viewport, aka the camera. Real time lighting is a second (see below).
Games and effect engines can integrate tracking data directly and therefore provide standalone solutions. This is beneficial to minimise latency whilst employing the very latest hardware. A downside to this approach is sequencing media. Real time rendering is at an advantage when it does not need to load and buffer video.
Modern media servers are comprehensive 3d tools. They typically manage video, images, audio etc. and send it to display surfaces via processes of chunking, transformation and pixel mapping. The distribution of the mapped results is the major argument to manage real time engines through media servers. They furthermore handle control protocols and connect to show related periphery. As a consequence solid integration between media server and render engine, open up the production toolset and allow to mix and match the respective strengths.
With the above information we can consider two types of setup:
- In a parallel setup each tracked camera has its own dedicated render computer and transmission hardware. While all renderers hold the same virtual scene, each computer provides one of the multiple views. These are switched to downstream. Media servers provide data upstream of the renderers.
- In a switched layout an XR media server functions as a host. It pools the tracking data and either detects or determines the single active camera, which is rendered exclusively. The active view switch is done in the XR server. To provide more than one composited view the host can manage multiple render nodes. As a benefit the XR server can provide typical media sequencing and further control integration as well as bringing in signals from further sources.
In either situation the LED wall can only display one camera backplate at a time.
A parallel setup has one or more tracked cameras, each forwarded to a dedicated rendering server. Each server receives its respective tracking information to provide the 3d scene from the corresponding perspective. While each composite and camera bundle can be forwarded to the downstream AV system, the one active XR scene view for the LED screen is switched to downstream of the renderers.
Upstream of the rendering servers a media server can provide traditional video, graphics streams and presentations. These display as textures inside the virtual scene.
The switched setup has one or more tracked cameras. Video signals are sent to the media server and switched to an active view. The active view is the camera currently used. The media server manages the virtual scene renderer and passes the active camera information to it. The returned render is mapped into a representation of the LED screens and output accordingly. Separately the media server can composite all layers or forward them discreetly into other AV infrastructure.
Both of the above examples are schematics to illustrate the signal and control flow. They are not comprehensive wiring diagrams. We have omitted synchronisation and networking architecture , which are further critical constituents.
While a single active view XR may benefit from an engine-only setup, the benefit of leveraging higher level integrations and mapping tools indicate that in XR for live streaming and broadcast, media server platforms are advised. This suggestion gains weight as we consider the particular quality challenges for integrating cameras, renderers, lighting and recording.
Media servers such as disguise, VYV-Photon, Pixera, ioversal-Vertex, LightAct, Green-Hippo, Smode, and Touchdesigner implemented Notch for a number of years while support for Unreal and Unity has been and is product dependant, though generally on the rise.
With a focus on XR setup and operation the forerunners appear to be vyv and disguise, since they are providing a platform to resolve particular challenges above and beyond mapping, switching, and hosting of render engines. Solutions like Pixotope, Ventuz, ZeroDensity and VizRT provide similar solutions for green screen environments. In our discussion we focus on events, where we feel that the performers and scenery need to relate, act and present in a unified space. Therefore avoiding additional keying steps and further colour post processing is important. The XR formats further meritt is an inclusion of the production team and live audiences to a common visual framework.
To achieve a high quality inclusion, two features are of particular interest lifting the media server to a platform:
Firstly; addressing the data roundtrip cost. Since the active camera is moving, being tracked, passed to the renderer, and then displayed on the LED, any XR rendered frame on the stage is late. This delay needs to be managed, for instance by buffer and passing results to the compositing stack we discussed earlier. As a result the media server also becomes a timing master.
Secondly; there is a multifactor problem about colour. On one hand LED tiles are tough to get right. They only truly have uniform colour when they derive from the same production batch, and even then need recalibration to a common level after some time. What is more, in the physical setup the XR stages and volumes have to resort to different tile products to handle props and performers on the walk on floor. This leads to a more robust pick, that commonly has lower resolution and a different colour output profile, perhaps even brightness. A final complication with LED tiles is the arrangement of the red, green and blue diodes that combine into a single pixel. Without going into too much detail, this arrangement changes the colour perception of the LED screen based on the viewing angle of the camera or observer.
On the other hand we find that the image of the virtual scene, received by the camera has been nuanced by the above physical limits of reproduction in the LED, and has furthermore been superimposed with lighting around the XR stage to make the scenery and performers fit the virtual environment. This colour roundtrip can be observed when comparing the camera picture and set extension.
To manage the colour reproduction, the media server platform needs to calibrate cameras, tiles and residual ambient lighting. This is an extensive process, but the time invested is critical to the overall quality. As this calibration time does not previously exist in most production schedules, please review the requirements of the XR team for camera calibration.
So far our team tested disguise’s delay and colour calibration process. It allows different quality levels that come at different speeds. The more colour accuracy, the longer the lookup.
The visual results are worth the waiting. With regard to the inter-tile calibration and the camera viewing angle of the set, the process achieves a very homogeneous picture. Matching the set extension is reasonable. Since the virtual scene in the extension is not exposed to all the real world factors, matching it remains complex for the above mentioned roundtrip, although it can be edited separately in the server
Matching the colour balance of the set extension to the backplate is a sophisticated task. The set lights for props and performers need to be fed back into the virtual space to match shadows, specularity, colour and brightness. The media server platform can assist this process, so let's look at lighting in more detail.
Lighting for XR stages needs to convey the virtual scenery’s effect to the real props and performers, or control or be controlled by the lights in virtual space. This is because the XR stage only covers a fraction of the surrounding virtual world. XR volumes are better in this sense but lighting still plays a big role. Additionally it is noteworthy that lighting plays a major role in any location set in the real world.
In XR Lighting is becoming a hybrid subject, where a lighting desk ought to control real and virtual fixtures alike for a unified effect. Consider that a single show lighting rig can therefore have parts in real and virtual space, that overlap fully or in part in both spaces.
In order to convincingly convey that performers and props live in the suggested virtual scene shadows, reflections and shading are the components that compose the camera pick up to the backplate.
Shadows are challenging to create onto a luminous surface but necessary to situate the performer or prop on the surface they are standing on in the virtual scene. Therefore virtual shadows need to be part of the illumination itself. Respectively these shadows and their sources need to be considered in the physical stage.
The virtual environment is capable of presenting highly complex lighting fixtures. Massive chandeliers can become pixel mapped, infinite arrays of fixtures can be created. The consideration is how does a virtual light impact the performers and props in the camera?
One way of joining the two worlds is to have the physical lighting fixtures mirrored in the virtual environment in position and functionality(intensity, colour, gobos). Controlled by a lighting console or tracking beacon. This gives the lighting designer the ability to match the lighting intensities and colour to solve the look of the scene.
Automated lights can be synchronised between the two, delivering DMX data to both spaces and adjusting the position of the virtual lights to the physical ones. This intertwines the behaviour of a beam of light in the physical stage with the falloff in the virtual environment.
In a virtual show environment the physical lighting rig could be extended to the virtual world.
When you are considering casting shadows in a scene all elements and performers need to have their counterparts in the virtual environment.
One solution is to use LED video panels in smaller configurations to create video-light panels (naming suggestion: proxy volumes) capable of extending the ambiance and tone of the virtual environment onto performers and props.
These video lights can more efficiently convey reflections and true virtual space ambient light states by displaying the corresponding content from the virtual environment. Positioning of video-light panels can help to glue the possible front plate(AR) elements to the scene.
By controlling the physical size and nit-output of the panel we can simulate global illumination on the scene. Nits is the brightness of the panel. 1500-nits the usual brightness of LED products can be translated to 4700Lux comparable to approximately 4 Arri SkyPanels.
The balance between the intensity of the screen and performer lighting is a major consideration in convincing the audience of the illusion (See above: colour round trip issue for front plates). If something lives in the distance in the virtual environment, consider how performers would be lit by it and use lighting to extend it on to the scene. Studio lighting would in a way need to produce the result of a light probe onto the scene.
Balance is a multifaceted product. The camera's shutter speed needs to be calibrated or set according to the LED product. The power of the LED should be set to light the scene ambient efficiently. The camera ISO also needs to be set to not produce noise in the final output. The power of the studio lights, when correctly set, will result in convincing directions and shadows.
In addition, simulating the pixel pitch and camera resolution, help to eliminate Moiré effects.
The characteristics of LED screens are that they are luminescent. This is challenging for shadows as previously stated, but also for ambient occlusion between physical objects and the virtual environment. Without a convincing connection, elements in the scene will seem disconnected from one another. The luminescent characteristic on the other hand is the benefit of an XR stage workflow as it contributes to the ambient lighting, reflections and specular highlights needed to convince the audience of the illusion.
Another consideration for the floor is its reflectance. If you are using a LED tile with glossy finish the reflections may look unnatural. A preferred option would be to use a pure matte finish on the LED tile. As an example when considering the realistic characteristics of a concrete floor in content it should be matte and diffuse light.
Depending on the virtual environment, some elements might need to be reflected back onto the floor in content (recasting the light onto the set extension).
There are challenges when working with darkly lit environments in that it exposes the LED walls to the studio light. Spill onto the LED walls becomes much more noticeable. Also the directionality of the light hitting the different surfaces creates different levels of illumination onto them. The floor receives more light than the walls potentially breaking the illusion of the environment. Where there is enough light that will help to create shadows and needed highlights on the performers whereas too much light would potentially reveal the XR stage. Or as mentioned earlier it could create even further problems with colour calibration of the set extension and XR stage.
With regard to the performance on the physical stage the light rig needs to accommodate for full utilisation of the stage space with even light distribution. In the case of XR stages, creating light wedges with sufficient light to keep the performer constantly lit, even considerably further downstage than the LED floor’s lip.
Having the performer(s) walk up to the front of the stage needs to allow for a light situation capable of continuing the look dictated by the virtual world.
If the virtual world is, for instance, created with strong sunlight, the front light should be considered with large light sources for the diffuse shadows and far enough in front to light the entire object.
In the context of a more theatrical XR production, the lighting rig should mimic that of the needs lighting has in theatres, with the addition of the needs of the cinematic result. This should be scaled to the level it can accommodate multiple scenes in one fluid take.
Positional tracking of cameras could be shared with lighting to maintain the visual impact. Skeletal tracking for shadow puppeting in the virtual would additionally improve the suggestion of virtual and real connection. Increased tracking of performers for lighting and virtual shadows needs to be considered, tested and improved, especially considering non-tracker based systems for performers.
Lighting is no longer an abstract concept but the merging of virtual and real lights is now necessary for theatre, show and cinematic lighting disciplines. The need to accurately predict how the production shot will look, is amplified in a workflow without post production, where all aspects of the visual product need to be ready for the camera in real time. Content production takes the role of set building, in the creative as well as the technical sense.
Pre-production of the virtual scene in union with the lighting design and actual set build demands its own schedule (See last chapter also).
A final consideration is how to translate the real world lighting values into render space.
As there is no appropriate math to convert physical light units into virtual renderers. Matching this with the in camera physics is also guessing but not science. Render engines use different algorithms to solve raytracing and each comes with its own bias, strengths and weaknesses.
The development of renderers like Redshift RT, Blender’s Eevee, Cinema 4D U-render could be utilised to create simulation scenes for the purpose of previsualisation of the physical lights with the effects on a XR stage above and beyond the aforementioned real time engines.
This complex estimation should be considered an artform. Onsite it is typically consuming a seizable part of setup time.
When XR is under consideration for a production, one of the major challenges for its successful use is the familiarity with all of the components that make up an XR Stage. Media servers have been used in live events for 20 years, while LED screens started to dominate scenic design in the last 10 years. Camera tracking for AR overlays has been in use since the mid 90’s, debuting in the sports market.
However, mixing these tools creates an entirely new production paradigm, one that is still evolving for both cost and workflow. While developing the process for how we efficiently use these tools together, consider every XR project part of a grander experiment to change the future of live events and broadcast production. Don’t let familiarity with the individual components of XR lull a production team into a false sense of understanding of how a production should be run and what it will cost.
Most importantly, consider your XR team a strategic production collaborator. Any meeting you would typically invite your set or lighting designer to should now include your XR or Screens Producer. If you have not worked with a media operations team (your media server team) in the past that includes a Screens Producer, it’s possible the team you are considering for your XR needs is not as experienced as they suggest. A good XR team should include a producer, programmer, content coordinator/workflow specialist and likely two or three engineers. In most cases, this represents expertise from different companies that specialise different aspects of the XR production pipeline, though teams are evolving to deliver complete coverage. At minimum, your point of contact for an XR project should sub-contract or point you to the desired partners.
Along with a larger XR Operations team, the time consideration is significant. Best practice currently suggests two days of calibration time per tracked camera. This time might be referred to as ‘dark time,’ but it is a bit more involved than the ‘dark time’ needed for projector alignment. Camera calibration requires a clear stage, free of any work or rehearsal activity, and it requires the camera operators and all utilities and engineers for those cameras to participate as well as the XR and LED Screen teams.
Other time considerations should be spread out across the production schedule. The software that drives XR is constantly adapting and improving, resulting in the occasional bug or stopdown to resolve issues. Creating space in the schedule to ride out these events will protect the project and allow the teams involved to find solutions or work-arounds to keep the production moving forward.
In the past, you may have not even known when there was an issue with media server playback. These teams are excellent at quickly adapting and finding solutions to the various problems that come up with these tools. The significant change to these teams with XR is when there is an issue, there is no way to hide the problem anymore. Any fault will cause the entire production to pause while a solution is found. The camera team may be needed and the stage may need to be cleared to fix a calibration issue, so camera rehearsal is not able to continue while these problems get sorted.
The XR process will continue to improve and these potential faults will decrease, but for the time being it’s best to prepare what happens if XR is not working.
If part of your live event is XR and the rest uses the LED Volume with traditional content playback, consider running XR on an isolated system from the rest of the show's playback needs. Also consider how you might rethink XR creative for traditional playback and have those files prepared. The XR team should work closely with the content production team advising on strategies and creative choices that feature what works best in the current software version, while hiding the flaws or issues.
As a final point of consideration and something unfamiliar to the common interaction with a screens operations team prior to XR, the XR team should work closely with the camera director, choreographers and stage managers on blocking. Good communication and shot planning between the XR team and the Director will simplify and inform many choices that otherwise might critically complicate an XR production. This is why an XR producer should be involved in production meetings as early as possible.
For example, the shooting venue is really no longer the physical space of the LED stage, but part of the 3D world supplied by the content creators. Pulling wide on a shot is now a budget determination of how much of that 3D world will get built, rather than part of the scenic, lighting, or audience that would fill in that negative space. And that’s for one tracked camera.
When two or more cameras are tracked, the production team must carefully plan camera shots. The LED Wall can only display the correct camera background for the active camera, or else the camera field of view, and related virtual camera frustums, must not overlap. For some, the barrier to successful XR is not cost or time, but the resistance to a new standard of XR team participation to camera blocking. This collaboration is essential and will help optimise costs and greatly improve the on-site process.
Without question, this is an exciting time to be working in XR. There are fantastic examples of successful use of these tools. However careful discussion, schedule and budget review make this process much smoother. The impacts to budget, schedule and team communication cannot be overstated. We are re-inventing the production process.
XR production tools and adoption are shaping up, yet the technology comes with a cost. The production community is currently focusing attention on making effects in real time and conforming them to user friendly formats. This leads to a strong auteurism in the medium. The inherent dependence on carefully planned points of view, and the inevitable limitations of the virtual space’s design and rendering, manifest this level of control. While the hard and software tools strive to offer greater degrees of freedom in the creative process, the results are hard to distinguish from film and TV products.
What is the essential added value with real time XR events? In the production it’s obvious; less post production, infinite virtual worlds, low set building cost, less travel, creative freedom in the process…, yet the information conveyed will end up in a recording for viewing on demand without exception.
Audiences can participate in XR shows in real time, but do they gain a visual benefit assuming the output is a typical 2D canvas? The viewer is oblivious to the complex process. The live events industry should therefore discuss the just-in-time paradigm and how the new technologies will feature in productions once crowds come back to venues.
Right now spectators use personal devices or watch from within a private setting, sharing an experience with family or very few friends. This provides design considerations for a form of broadcasting into intimate, small and cosy spaces.
Once XR productions narrowcast into game and VR style formats, the audience bubble inevitably shrinks to the single viewer. Interactivity and freedom of movement increase, while the virtual space overwrites the real.
At this junction volumetric capture supports bringing individuals into shared virtual and XR spaces. The rules of presentation can be broken but the common denominator is the designed stage.
Is there a space that could be defined as hybrid XR stages in live event streaming?
A stage that could be more like the traditions from theatre or shows. Where we look to build the environment not only in the virtual but a stage that gives room for both the physical lighting, props, atmospherics which gets extended by XR. Not as a full immersive stage but as windows into the virtual. A stage that still utilises camera tracking to align the virtual with the physical. So to extend the set and streamed volumetric performance. We are used to creating looks that only elude the audience to think that something is. So why is the format of streaming changing it?
- Co-Author & Illustrations: XR Space
- Co-Author, Lighting Renders