Visual effects start with the reflected light from an analogue real world scene to a digital subset of that scene displayed on a monitor or projected in a theater. During its travels through the VFX pipeline the image will traverse five “spaces”. It is essential to understand all five of these spaces to avoid introducing avoidable color artifacts or degradation into your shots. In this article we will look at each stage of the VFX color pipeline to understand the space it is in, why it is there, how to minimize artifacts, and why it is mathematically impossible to completely avoid artifacts. Our job is essentially to manage those artifacts to keep them below the threshold of visibility.
This pipeline is characterized by a loss of information at each stage, but this is not a real problem since the whole objective is for the final displayed image to look like the original scene to the human eye which is (fortunately) limited in scope and range. A properly designed color pipeline will limit this unavoidable data destruction at each stage such that the final displayed data will still look good to the human eye. It’s rather like designing a crash-worthy car. The car is totally mutilated, but the passengers survive intact.
A word about the color science here. Color science is a formidable topic with complex components of optics, color theory, signal processing, and the wildly complex human visual system. So the color science in this article is, of necessity, grossly over-simplified in order to tell a good story. If it weren’t, this article would be a book and you wouldn’t be reading it. So I suggest that actual color scientists avert their eyes and leave the rest of us in peace to try to understand this massively complex topic in an admittedly simplified way that will actually be useful to artists creating visual effects.
1: World Space
The visual effects color pipeline starts with the real world scene, of course. The real world consists of Spectral Power Distributions where the color of each object is defined by which frequencies of light (spectra) are the brightest (power) over what range (distribution). World space is not a color space.
To give an example, consider the lowly banana illustrated here. For us to perceive its yellow color the continuous frequencies of light that are reflected from it are graphed in the lower right. Starting at the 400nm section on the left, very little light of that frequency is reflected. Moving to the right we see that 550nm which we see as green and 700nm which we perceive as red are reflected much more prominently. This strong green and red reflected light we perceive as yellow.
Notice that this example of real world Spectral Power Distributions is continuous from 400nm to 700nm, the range of human vision. We perceive a color based on which frequencies are most strongly reflected, or the power at each frequency. You might think that this does not apply to CGI because it produces digital images, not spectral power distributions. But you would be wrong. They are actually modeling the real world, so inside the math of the rendering engine advanced scene lighting models indeed use spectral power distributions internally, while outputting just the resulting digital RGB image.
2: Camera Space
The imaging sensor of a camera digitizes the visible spectrum of the real world into an “informational subset” which is output as simple RGB values; this is camera space. Like the CGI renders above, they both start with the visible spectrum but only output an RGB image. Color scientists disagree amongst themselves whether or not a camera space represents a true color space.
This figure illustrates how the camera converts the continuous spectrum of reflected colors into the discrete RGB values that we are familiar with, and in the process discards a great deal of the original scene information. The color pie chart illustrates the proportions of RGB used to make the yellow banana. But somehow the RGB image appears the same to the human visual system as the spectral version in the real world.
The reason this digitized subset of the real world looks to us like the original real world scene is because of a phenomena of human vision called “metamerism” where the eye can perceive different color values as the same color. This does not mean that red is perceived as yellow, but that there are a range of spectral values that while different, appear the same to the eye. You might, for example, lower the power (brightness) of the yellow frequencies at around 600nm while increasing the green (550nm) and red (700nm) to compensate. Both would appear the same to the eye due to metamerism. This is why the RGB digital image can appear the same to the eye as the real scene spectral version. In both cases we perceive a yellow banana. The key point here is to realize that while the RGB image still looks yummy, much of the spectral information about the banana has been discarded. Our first major data loss.
♦ If you would like to learn more about digital color check out my webinar All About Color for Digital Artists!
3: Storage Space
Once the digitized image is read out of the imaging sensor it must be stored as a file somehow. One way to do that is camera raw, where the raw output of the imaging sensor is stored directly without processing (other than the de-bayering process to convert the raw sensor data to RGB). Writing out the camera raw is done to retain the maximum amount of image information because it essentially captures the entire camera space, but the file sizes are very large and inefficiently stored. Storage space is not a color space. It is just the space where you store stuff.
For this reason the digitized images will usually be stored in some compressed format. Compressing an image naturally discards some of the image information. How much information is discarded depends on the compression scheme. While there are true lossless compression schemes for graphics and CGI, photographic image compression schemes entail some loss. To be sure, there are some very low-loss compression schemes out there. Pixar’s EXR file format comes to mind where they can compress an image to about half its original size with no loss of visible information. All other compression schemes such as jpeg or log will lose a great deal more than that.
This is illustrated here. Taking a 16 bit per channel 4k DCI image, the camera raw file size is 47MB per frame. Converting it to 10 bit log (DPX or Cineon) reduces that to 35MB (a 25% reduction), but a jpeg compression can make it as little as 5MB per frame (a 90% reduction). These are therefore lossy compression schemes.
The key point about these compression schemes is that they are all designed to squeeze information out of the image data that the eye does not notice. However, while the picture may look dandy on your monitor, the discarded information can become a problem when you move to the next stage, the work space. This is because in the work space we will be pushing and pulling on the image data during our image processing operations and deficiencies in the data become amplified. The classic example of this is trying to key lousy DVCAM footage with heavy chroma subsampling. The information is simply not there for a quality key, but the nice client will point to his monitor and say the greenscreen looks great to him. Your response is “it looks great on the monitor, but not to the computer”.
4: Work Space
The work space is where the actual image processing is done for the VFX shot and is definitely a color space. Maybe it is a greenscreen shot to be keyed and composited, or perhaps it’s in for object removal or to composite a CGI character. For this step in the VFX pipeline you really want as much data about the image as possible because all of the image processing operations that are done in VFX will introduce further data losses as the RGB values are repeatedly crunched and manipulated. The key at this point is to move the image from its storage space to a work space without introducing further losses. The work space must larger than the storage space and the display space so that the images are not clipped and with sufficient floating point precision that round-off errors are microscopic and undetectable.
To achieve this, serious VFX software like Nuke works in linear space. Nuke’s famous 32 bit floating point linear light space comes very close to achieving the ideal workspace. Why only “very close” you might ask, gasping at the notion that Nuke is not 100% perfect? Because when an image is captured the capturing camera “bakes in” its color responses and limited dynamic range. Capture any scene with two different cameras and the images will not be identical to each other or the original scene. If these images are simply linearized the camera attributes are still baked into the linear version. Removing this last bit of image capture bias by the camera is one of the key features of ACES, the mathematically perfect work space for VFX. ACES attempts to represent the actual light from the original scene without a camera bias. ACES was specifically developed for visual effects and is becoming adopted by more and more VFX studios. Sounds like we need a webinar here.
Lesser software like After Effects will typically work in display space such as sRGB or rec709 with integer data rather than float. Working in display space suffers from three problems; first, you are working with a limited dynamic range image because the data is restricted to the display devices dynamic range – there is no “headroom” to retain specular highlights. Second, display spaces are not linear and this introduces color errors in all image processing operations. And third, working in integer introduces serious round-off errors that can quickly accumulate to degrade image quality. The least you can do is promote any 8 bit images to 16 bits or, better yet, to float to minimize the damage. Ideally you would also convert them to linear, but converting and working in linear can be a bit tricky if your software is not specifically designed for it.
♦ If you would like to learn more about working with linear images check out my webinar Working in Linear!
5: Display Space
Display space is the color space of the intended display device – workstation monitor (sRGB), TV screen (rec709) or digital cinema projector (P3) and is a true color space. It is obviously necessary that the displayed image be in the correct color space to match the color characteristics of the display device or it will suffer inappropriate color and brightness distortions. So in the VFX pipeline there is a specific step to convert the floating point linear work space to the integer display space for viewing. If we were working on a movie then there will be a color version for theatrical (P3) and a different color version for TV (rec709). “Future-proofing” the movie entails retaining the linear (or ACES) version at the highest resolution possible without display space conversion so it can be converted for whatever display devices are developed in the future.
The VFX color pipeline starts with the spectral power distribution of the original real world scene and ends in a limited dynamic range display space of 8 or 10 bit integer. Our job is to manage the image data through this pipeline such that the image degradation is minimized at each stage and the color is properly preserved. We can’t control the world space or the display space as both of those are a given. But we can control all of the intermediate spaces to ensure the highest quality visual effects.
The camera space is dictated by what camera is chosen for the shot. Of course we want the greatest dynamic range available to minimize clipping the original scene, but a key issue is what file formats it will export. Lesser cameras will only export highly compressed images. Not good.
The storage space must support a range of RGB code values equal to or greater than the camera and not deplete it by converting it to lower precision. In other words, don’t take a 10 bit camera output to 8 bits for storage. The best options are 10 or 12 bit log, or the Cadillac of storage spaces, EXR 16 bit float.
The workspace must have a much greater dynamic range than both the storage and display spaces to avoid clipping the data, must be floating point to reduce or eliminate round off errors, and ideally would have the camera bias backed out of the captured image (ACES).
The way to think of the VFX color pipeline is as a sequence of operations where the data is depleted at each step, but by starting with high quality data and properly managing the VFX pipeline you can ensure that your VFX not only look great on the intended display device, but will also be future-proofed to look good on display devices of the future.
Until next time, Comp On!