8K Cloud Streams: Codec Testing Addendum

Abstract

This article examines whether streaming codecs can deliver 8K 10bit content at visually lossless quality, at bitrates compatible with high-bandwidth contribution links, fast enough for real-time production. Testing HEVC, VVC and AV1 against ProRes 4444 main10 source material — across rendered graphics and camera footage — our findings indicate that at CBR 150 Mb/s, all three codec families achieve perceptually indistinguishable results within the visually lossless range as measured by VMAF, PSNR and SSIM. The codec is not the limiting factor. What remains to be validated is display-side performance on LED at production brightness levels, and end-to-end latency over wide-area networks.

The infrastructure argument

In our previous research into 10bit codec viability — findings verified extensively in the field and still current — we established that high bit depth content could survive the round-trips of local production pipelines: offline rendering, mezzanine codec delivery, on-premise media server playback. That conclusion stands. What has changed is the infrastructure context.

Media servers have always been the operational heart of large-scale production — managing visual output with frame accuracy, responding to cues and sensor inputs in a local, highly available and secure environment. That role is not diminishing. But the range of what can happen around and upstream of that core is expanding significantly.

Across the server ecosystem, distributed and real-time rendering architectures have matured. Platforms including Pixera, VYV Photon, LightAct and Green Hippo each support multi-server configurations — synchronised groups sharing media and workload across local networks — with integration for real-time engines including Unreal, Notch and Unity. Disguise has gone further with their dedicated RX render node range and RenderStream protocol, purpose-built to distribute rendering across synchronised clusters connected over 100GbE backbones, with the media server acting as compositor and output engine while render nodes handle the real-time content. The signal fabric in these environments is video over IP, the architecture is designed to scale, and the infrastructure cost reflects that ambition.

The result is that on well-provisioned sites the local pipeline has never been more capable. But capability on-site does not resolve the distribution problem. Most productions do not operate within a single well-equipped facility. Creative work is distributed across teams and time zones. Cloud rendering is viable for 8K production work. And increasingly, the creation of pixels and the display of pixels happen in different places — which means something has to carry them between.

The case for streaming

The default answer has been dedicated fibre and high-bandwidth local infrastructure. For controlled environments — venues like major sports arenas with purpose-built 100GbE networks — this works well. But fibre is point-to-point and does not scale to distributed teams, cloud render farms, or venues without existing infrastructure. The gap between what large-scale purpose-built networks can carry and what most teams can provision is where streaming codecs become relevant.

The argument is not that streaming replaces the local server model — it extends it. If a streaming codec can deliver a visually lossless signal at contribution-link bitrates, more of the composition and rendering work can happen upstream, in cloud infrastructure, before delivery to the venue. What arrives at the server is a more finished stream rather than raw elements to be assembled on-site. The server's cue-driven, frame-accurate local functions remain intact; what changes is how much of the creative pipeline feeds into it from outside. Over time, the server increasingly becomes a decode, display and show-control interface into cloud-side functions — rather than the sole site of composition.

This also sidesteps one of the harder problems in multi-source live production: aligning streams of different origin in real time introduces decode-re-encode cycles, each carrying quality cost and latency. Moving composition upstream avoids that by producing a single finished stream before delivery, reducing both the technical burden and the failure surface at the venue.

What we tested

We encoded two source types against ProRes 4444 main10 masters at 8K 60fps: motion graphics at 8192×8192, the square format common to dome and LED canvas work; and camera footage at 8192×4320, sourced from RED's sample library, which records in 12-bit RAW R3D, conformed to our test specification. We are grateful to RED for making that material available.

We tested HEVC, VVC and AV1 across CBR and CQ encode modes, measuring PSNR, VMAF and SSIM. CBR was the primary focus — production links require predictable bandwidth and CQ modes are operationally unsuitable where capacity must be shared or guaranteed. Working range: 100–200 Mb/s CBR.

Reading the chart: Select a quality metric (VMAF, PSNR, SSIM), a codec mode (All, CBR, CQ/QP), and a sort axis. Bitrate sorts high to low — most expensive codec left, most efficient right, with normalised Mb/s figures shown below each group. Quality and Quality/bit sort low to high. Zoomed narrows the y-axis to the range where all results cluster. Avg overlays the mean of GFX and footage. Hover any bar for full values.

HEVC and VVC testing was conducted in partnership with Spin Digital, whose software encoder produced the highest quality results across our tests and which we endorse without reservation for production use at this specification. VVC's standard extends to 16K and maps clearly to where large-format production is heading, though real-time encode without dedicated hardware remains demanding. NVENC provides accessible hardware encode for HEVC on standard NVIDIA GPU hardware, making it the practical baseline for most distributed workflows. AV1 is fully open; HEVC carries near-universal decoder support. Encoding is computationally far harder than decoding: an encoder must analyse content, make rate-control decisions and produce a compliant bitstream simultaneously at 8K 60fps — the encode side of a distributed cloud pipeline warrants as much attention as the display side.

Conclusion

At CBR 150 Mb/s, codec choice between HEVC, VVC and NVENC is not a quality decision — differences are below the threshold of perceptual significance across all three metrics. The streaming codec path is viable. The broader model — cloud-side composition feeding a contribution link to a venue server whose core functions of frame-accurate, cue-driven local playback remain unchanged — is supported by these findings, subject to two areas of further validation.

First, display-side verification: metric-based assessment cannot fully predict how a stream-delivered 10bit signal presents on an LED surface across production brightness levels. Gradient compression, chroma handling and dark-region detail require physical testing under show conditions. Second, end-to-end latency over production-grade WAN infrastructure remains to be characterised for frame-accurate live applications. This work is being conducted in conjunction with the Kwokman Productions; findings will be incorporated into a subsequent revision.

The quantitative case is established. Physical validation on display hardware and latency characterisation over WAN infrastructure represent the next stage of this research.

Beyond the current assessment

The codecs evaluated here represent what is operationally accessible today: mature, widely supported, and encodable on standard hardware. That will not remain static.

VVC is already demonstrating efficiency gains over HEVC at equivalent perceptual quality — in our own testing it consistently produced the highest VMAF scores in the quality-per-bit ranking. Its specification extends to 16K, and as hardware acceleration matures, the trajectory points toward either larger canvases at current bitrates or equivalent canvases at meaningfully lower bitrates.

LCEVC (MPEG-5 Part 2) takes a different approach: rather than replacing existing codecs it operates as a standardised enhancement layer on top of HEVC, AV1 or VVC, with reported compression gains of 25–40% at equivalent perceptual quality. V-Nova hold the commercial licence; open-source framework support is arriving via FFmpeg and GStreamer. AI-native enhancement layers — learned residuals, neural in-loop filters — are developing in parallel through MPEG's JVET working group and indicate a further step beyond what conventional codec design achieves. Both are on our test schedule.

In-loop neural filters run ML inference on every decoded frame to remove compression artefacts — blocking, ringing, chroma loss in gradients — before the signal reaches the display. Media server GPUs already carry the tensor core capacity to do this; they are simply not being used for it. The benefit is immediate and codec-agnostic: any compressed signal passing through the server, whether from local storage or a WAN stream, would see perceptual quality improvement without changes to the source or the delivery chain. The same capability also positions servers to handle the more complex neural decode pipelines of future codecs, where in-loop filtering is part of the standard rather than an addition. Tensor cores sitting idle in a render node are an untapped quality layer. The industry should treat them as one.

8K Streaming Viability - Cloud Rendering & Distributed Delivery