10bit Workflow Viability + Codec Comparison
For some time the big technical demand in events has been resolution. While a significant amount of live experiences now equal or surpass any home entertainment system, creatives have disregarded high frame rate content for the most part. Codec and storage limitations have mostly shied away from delivery formats beyond 8bits.
High refresh rates are common in games, as well as HDR, and in-cinema color quality and management has usually been better than in events. Nevertheless, both specifications promise significant quality increases to all audiences.
Every so often higher frame rates have been used for content, but only more recently have render engines and real time content tools made 30fps and beyond realistic.
In color management, content creators have often used high bit depth intermediate files to mitigate compression loss and artifacts in final outputs, but have mostly not had good enough codecs.
Admittedly HDR players have existed for some time and they do deliver. Yet what makes high bit depth, high framerate content viable are lightweight video codecs and hardware that can handle multilayer compositing, color grading and remapping of these assets.
In order to improve our own content workflow and gain experience with the hardware in delivering 10bit content, we set about testing both. Therefore we created a 10bit motion graphics reel, test codecs for image fidelity, size and encode speed. Finally we had to achieve playback in a media server, through high res switchers, to display devices and confirm the high bit depth integrity at the output level.
Part 1: Creative Intentions, New Rendering, Real Time, History of 16bit Master Files
As content creators we are constantly on the lookout for ways to give our graphics as much visual impact as possible. We are required to deliver content at increasing fidelity as the power to drive experiences continues to grow along with audience expectations.
Our tools develop at an exponential rate, render engines get faster and faster and the methods for producing effects expands creative boundaries to new levels. In a constant quest for visual fidelity, contemporary methodology and efficiency our studio continues to adopt new workflows into the mix. On recent projects we have been implementing procedural workflows by bringing the power of Houdini into the forefront of our process. On the rendering side we have been employing the real-time capabilities of Unreal Engine and Notch, which we use alongside more traditional renderers such as Redshift and Arnold.
However, working in events, there is always a sacrifice that comes at the point of final delivery where all our lovingly crafted graphics, with their fine gradients, deep contrasts and subtle details are reduced into a format optimized for smooth playback over image quality. 16 or 32 bit graphics are reduced to 8bit, making gradients band and removing all the details in highlights and dark areas. There is always a sharp cutoff at the end of our process that means no matter how much love and care goes into creating our content, it will rarely be received as intended.
Until now, HDR media playback was mostly confined to hardware players or to media servers handling only limited resolutions or transformation. In-server editing of multi-layer 4k+ HDR and mapping simulations has been unrealistic.
However with the advent of servers such as disguise, wings or delta that are able to handle more data bandwidth, the need to compress our content down to 8bit is decreasing. But this raises the question; which codec will bring us the best visual result, at the fastest render time, allow for server side compositing, and not instantly fill up very valuable media server storage space? This is why we set out to make our own reel of CG elements with a specification that we feel is closer to a new standard: 4k DCI resolution, 60fps 10bit color depth to produce a direct comparison of the current delivery formats.
To put the codecs through their paces we created a range of graphics that focused on areas that are traditionally tough to handle at compression time. Organic flowing particle motion, volumetric lighting, and iridescent (chroma) gradients running over cloth simulations and anything with subtle changes in color all make for great test cases. We wanted to use a range of software for the animations and the rendering so we would have a good range of results using all the tools we employ on a daily basis. This included: Houdini, Cinema 4D, Marvelous Designer, Substance Designer, Notch and Unreal Engine.
In addition to our own motion graphics, we also encoded a publicly available sample from ARRI, and we obtained an image set from 10Bit FX who entrusted us to test their beta releases of the NotchLC codec.
Encoding Method:
All video and images sources were first saved to a 16bit uncompressed Tiff format and subsequently exported via Adobe Media Encoder into a range of codecs (10bit DPX, HAP, HAPQ, NotchLC(best+optimal), DXV3(high+normal), ProRes422+4444 (Vanilla), Lararith[deprecated], DNxHR, Daniel2, MagicYUV). This allowed for a direct comparison between the most common delivery codecs used for event graphic production.
Part 2: Codec Comparison: Viable 10bit Codec and Data Sizes Without Image Sequences
Overview
Recent product developments in playback and signal distribution as well as ubiquitous availability of fiber is set to make high bit depth video a new standard in the event industry. In our signal tests we demonstrated playback feasibility for 10bit 4k DCI at 60fps, to prove the hardware engineering side was ready for an end to end solution. While HDR playback has existed in hardware players for some time, our focus is on the viability of >8bit playback in 3d server and mapping applications, that will handle complex and fast timeline editing and 3d previz and projector simulation.
For our tests we ran 10bit DPX sequences against 12, 10 and 8 bit codecs.
In order to derive some more representable metrics we used three algorithms, and resampled the original dataset from 4K to 1920x1080px.
RMSE - Root Mean Square Error (https://en.wikipedia.org/wiki/Mean_squared_error), a standard method to compare any type of data, but explicitly not optimized for images. Think of this as a litmus test for color accuracy and luminance smoothness as all bits are equally weighted. The resulting number is a deviation from the mean values. A lower score is better
PSNR -Peak Signal to Noise Ratio (https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio), a basic image comparison function, which is a mathematical model like RMSE. Signal to Noise is value is on a logarithmic scale, expressed in dB. A higher score is better. While 8bit encoded images score around 40dB, 16bit codecs average around 70dB. (10bit codecs should be in between) Since brightness changes make for the visually most perceived deviations, we consider PSNR a good indicator on luma consistency as well.
SSIM – Structural Similarity (https://en.wikipedia.org/wiki/Structural_similarity), a method designed to weigh findings not only by general error or distance, but considering human perception. SSIM is a widely applied and regarded method in visual performance testing. Results are closer to human experiences than other methods. In SSIM a lower score is better and it indicates image feature integrity.
Ideally all three methods should find the same trends in the data.
The above methods were performed on preselected image pairs consisting of a master (uncompressed source) and a target (transcode). All images had the same size and bit depth, irrespective of the target codecs. The image format was 16bit uncompressed PNG managed in Rec.709. All handled and computed on one PC system. There are methods for temporal comparison with PSNR and RMSE, but the tools are limited and don't have wide ranging codec support.
File Preparation
All master files were saved directly from the original 16bit compositing file, while for each comparison, a target-codec video was rendered from the master, re-imported to a new 16bit composition and then saved out as 16bit PNG. This 16bit image method was employed for 16, 10 and 8bit targets, regardless.
When comparing still images, we applied a hue shift that would cycle through a full range for each image over 10 seconds (for B&W images we applied a greyscale cycle), making sure the first frame of image images cycle matched the original source images. This provoked each codec to engage on a video rather than returning frames from a video without delta.
Codecs
We decided to use typical codecs employed in our studio (ProRes:10-12bit, HAP:8bit), older but still ubiquitous event delivery standards (DXV:8bit), top range quality (DPX:10bit), and a new intermediate contender (NotchLC:10bit).
Update: We added comparisons for DNxHR:12bit (HQX, 444), MagicYUV 8bit (422, RGB), Daniel2:10bit (RGBA Vanilla,YUV422 Vanilla).
Hardware
For all renders we used a Windows 10 PC, 3.7GHz (Xeon W2145), 128Gb Ram, Nvidia P5000, Samsung Evo M.2 NVMe 1TB. At the time of testing the drive was 80% empty and the operating system was booted from a separate SSD.
Reels & Selected Datapoints
Analogous to the comparison algorithms, we made our motion reel reflect our needs (see part 1 above) by making visuals that in our experience would be tough to display. We ensured to have chroma and subtle dark gradients, fine sharp lines juxtaposed with blurs, hard edits and slow movement.
From our overall set we selected a few representative frames that are available for download here. Additionally, we used publicly available examples from Arri, and had access to images 10Bit FX are employing internally.
Results: PNSR fig. 2.1
Truncated logarithmic scale in dB - higher is better. Frames grouped by Codec.
Luma biased average (right bar or each group) by weighing images according to luminance qualities.
The lowest performers are the 8bit codecs, except MagicYUV which gets excellent results for 8bit.
Interpretation:
DPX is performing at 16bit level quality, ProRes 4444, DNxHR-444 variants and Daniel2-RGBA are runners up. The luma-weighed average of NotchLC DNxHR-HQX are solidly in the dB range expected for 10bit performance, with ProRes 4444 at the low end of the 16bit expectations, while MagicYUV performs very well. DNxHR-444 and MagicYUV-RGB are performing almost level with ProRes 4444. Hap and Hap-Q perform similar to DXV3-normal. DXV3-high exposes good results among 8bit codecs with an average just over ProRes422.
MagicYUV and Daniel2 have an extremely low noise variation across different input frames.
Results: RMSE fig 2.2.1
Linear Scale, dimensionless error - lower is better. Codecs grouped by frame.
RMSE fig 2.2.2
Linear Scale, dimensionless error - lower is better - Weighed Averages Luma + Chroma
Luma + chroma biased average (Fig.: 2.2.2) according to weighing images by luminance and chroma qualities. DPX values for the d+b image set in this series where virtually identical to the original and resulted in lim-0 values.
Interpretation
DPX image sequences present as the source. Runners up are DNxHR-444, Daniel2 and ProRes4444 in luma and chroma smoothness. NotchLC Best + Optimal, DNxHR-HQX, and MagicYUV-422 shape a performance cluster, splitting the middle between ProRes 4444 and 422. This is in line with the PSNR findings.
Even though Daniel2 does not win for every frame, but it outperforms all other codecs in problematic frames, and therefore has a very good average result, for Chroma and Luma.
Results: SSIM fig 2.3.1
Linear Scale, dimensionless error - lower is better - all codecs Codecs grouped by frame.
Results: SSIM fig 2.3.2
Linear Scale, dimensionless error - lower is better - structure weighed average
Interpretation
in the greater than and equal to 10bits category, both NotchLC types and ProRes 4444 cluster at second rank, following the winning group: DNxHR-444, and Daniel2 that share first. DPX sequences are almost identical to the source.
MagicYUV and DXV-high quality have the best 8bit results.
Results: Encode Time against Size (per Frame) fig 2.4
Linear Scale, smaller values are better in both axes.
In the above chart DPX is falling out of the range for size and encode speed. Both DXV and MagicYUV flavours are about as slow as DPX to encode while HAP and Lagarith are too slow to compete. We note that DPX is an image sequence that should take longer to write to the system, as individual files are created.
The remaining codecs form a speed cluster while exposing size differences.
Conclusion
In image quality we awarded grades from 0-5 points to each codec for each averaged property: Luminance Range, Chroma Range, Structural Integrity.
DPX images are virtually identical to the source sequence.
Among the video codecs Daniel2 is our winner by numbers. It's speed and quality are without equal.
DNxHR-444 is second, while both NotchLC flavours, DNxHR-HQX and ProRes4444 take third place at exceptional quality still. All these winning codecs are 10bit or higher. Therefore their score is unsurprising.
Among the 8bit codecs we are excited about how well DXV3 has performed. The low score of ProRes422 was unexpected. The quality winner in the 8bit category is MagicYUV by far, since its error is much closer to 10bit results.
In our test the top encode speed performers are very close to each other. A clear winner is not easy to make out but the fastest contestants in their order are:
Daniel2, ProRes, NotchLC followed by DNxHR.
Our interest here is in the great speed performance of the high quality winners, exclusive of ProRes422.
In terms of disk space all video codecs outperform DPX by factors of 4/1 to 10/1.
The best quality video codecs are about twice as big as ProRes422 or DXV, or roughly equal to current HAP-Q deliverables.
Combining all the above quality and speed factors, we can strongly suggest DNxHR444, ProRes4444 and NotchLC (best) as equivalent intermediate codecs. All three have 10bit or higher depth and are only moderately space demanding. The highest quality of this group is with Daniel2 followed by DNxHR 444.
Considerations
ProRes4444 has great quality and speed, but until recently was not supported natively on Windows systems, meaning a reliance on third party plugins to transcode into this format. As of 2019 programs including the Adobe Creative Suite and Fusion will export directly to ProRes on Windows and Apple systems.
ProRes also is designed as an intermediate codec, but it lacks support in real time playback outside the MacPro platform, which is seeing special playback cards being introduced, indicating that high bandwidth playback performance will be MacOS exclusive for some time.
Our studio is delivering most of its work via and to PC systems, which used to make working in ProRes less advantageous. It is worth noting that since the introduction of ProRes to Windows it has been the easiest codec to deliver in, as it comes as standard with all the major editing and compositing programs.
DNxHR 444 is an extremely robust codec, with errors lower than ProRes 4444 at roughly the same size and encoding time. It supersedes ProRes as intermediate codec in our opinion, while the HQX flavour is less desirable, but still a great delivery codec.
MagicYUV-RGB quality is hard to argue with! We feel it sets the standard for 8bit. With speeds of NotchLC, DNxHR, and ProRes about 4x faster, the quality improvement is only a marginal factor. MagicYUV 2.2.0 (released 2019.10.30) includes an Adobe MediaCore plugin, which allows 10bit+ import/export through Adobe Premiere Pro, Media Encoder and After Effects (through Media Encoder).
NotchLC is the youngest addition to the codecs tested, and is delivering top performance from the start with image quality that is sufficient for intermediate codec but outstanding for delivery.
Daniel2 is in a category of its own! We would love to suggest it as our winner, as speed and quality are unrivalled. However the downside we have found with this codec, is its exclusivity to CUDA acceleration and low adoption rate, which is potentially founded in the licensing of the codec.We would love to see software makers pick up Daniel2 more broadly, and wonder if it could perform on other GPUs too.
From our point of view NotchLC has the opportunity to gain the upper hand for playback, with support in disguise just announced, while ProRes, MagicYUV, DNxHR and Daniel2 lack support in this field. The media server makers should certainly consider these winners, as they pose as viable alternatives to heavy, CPU intense image sequence playback.
Hopefully we will see a comprehensive implementation of NotchLC in media servers as well as in compositing applications. If the hardware performance will allow multilayer compositing and playback, NotchLC could become the intermediate format + playback standard, although we would love to see a slightly improved quality option, within the next year or two.
Intermediate codec and delivery format in one file, implie double time saving to our content team and for content management onsite, as re-encodes are avoided.
We are looking forward to taking NotchLC to the field and getting further experience with it, and how it will help to marry real time workflows with classic content.
Additionally we want to encourage media servers to check implementations for DNxHR.
Please read on in the hardware testing section about our first hands on with a top to bottom 10bit playback from disguise via lightware to christie projectors.