The stress levels are rising, the deadline is looming and the shot you’re working on is taking far longer to matchmove than you first thought. Don’t worry – we’ve all been there!
Matchmoving is a technique used to track how the camera moves through the shot so that an identical virtual camera can be reproduced inside a software package, a process crucial in visual effects for integrating and exactly matching the perspective of CGI (computer-generated images) with live-action plates.
In this article, we look at some of the camera acquisition types commonly used for film, television and VR, as well as outline some of the key factors and limitations that can make a seemingly easy matchmove take a whole lot longer than expected.
Cinema cameras or cine-style cameras usually have a high resolution, high dynamic range, large format sensor with RAW data recording and the ability to capture high frame rates for slow motion. Commonly used for feature films, television dramas and commercials, this type of camera offers the very peak in acquisition technology. The use of an industry-standard, positive lock (PL) lens mount enables the use of the same cinema primes and zooms on different manufacturers’ cameras. Nearly all cine-style cameras record to common UHD broadcast and DCI spec film standards, along with non standard raw frame sizes beyond 4K.
Super 35 and Full Frame sensors have become the standard for high-end acquisition and will be the formats you’re most likely to come across when matchmoving. One thing I’ve noticed is the loose definition manufacturers use to describe the size of the sensor. For example, you will see Super 35 within their marketing, referring to the motion picture film format size of 24.89mm x 18.66mm. However, if we delve deeper into the actual specifications, we will see that this description is only an approximation of the actual physical size of the sensor plane. While small differences in field of view are not hugely important for camera operators, it is very important for VFX professionals such as matchmovers, compositors and 3D artists.
Slow motion can cause problems for matchmovers in certain circumstances. In order to achieve high frame rates, some camera systems have to window the sensor, effectively cropping it in order to increase the sensor’s readout performance resulting in a reduced field of view.
This means measurements given in the manufacturers’ specifications are purely the sensor size rather than the imaging area used to capture a given format. The same thing can happen when selecting a different recording standard, for example, DCI 2k resolution (2048×1080) might use more of the imaging area of the sensor than HD (1920×1080), meaning HD effectively has a narrower field of view.
Factors to consider when handling footage
Image resolution defines the amount of detail in footage or a still image. Modern high-end cine camera systems, such as those from Red and Arri, have resolutions of 6K and beyond. However, optics and sensor characteristics play a part in the fidelity of the final recorded footage. Not all 4K/HD cameras are born equal. Some use pixel binning and interpolation to arrive at a given resolution. While it’s not important to know how this works, it is important to know that this can dramatically affect the overall quality.
In the example below, I shot a scene in 4K (4096×2160) and in HD 720p (1280×720) simultaneously. Notice how fine details in the stone work are very visible in the 4K version, whereas they have disappeared in the 720p HD footage.
How does this affect matchmoving?
With good-quality footage, high-resolution plates can be a joy to work with. Fine details in the scene, which would have been completely lost in lower resolution formats, suddenly become a rich array of trackable features. High resolution is not without its downsides, though. Apart from the obvious increase in processing time, you’ve actually got to increase your feature sizes accordingly to avoid ending up with very small feature windows with limited useful data inside them. We can see this in the example below. The left-hand image is the feature window from the HD (1920×1080) clip and the right is from UHD (3840×2160).
Ultimately, increasing resolution does not always lead to an increase in tracking accuracy. Soft or poorly calibrated optics can have a similar effect on your footage.
One area where there is a lot of variances is dynamic range. In simple terms, dynamic range is the range of light/brightness that a camera can see. Have you ever taken a photo with your mobile phone on a bright sunny day and wondered why the sky looks so bright and the clouds have disappeared? This is caused by a limitation in the sensor’s ability to reproduce the brightest and darkest parts of the scene at the same time.
Some sensors are better at reproducing a range of brightness than others. I shot the image below using the same exposure settings, once with an HD cine camera and again with a mobile phone in HD video mode. Ignoring the lack of sharpness and depth of field differences for the moment, we can see the phone image has a complete lack of detail in the sky and the roof compared to the cine camera’s image. Additionally, all the detail in the foreground blinds are absent where they intersect with the sky in the phone image.
The reason for the differences in the detail within the two images is that the cine camera sensor is able to capture two-thirds of the total brightness range in the scene, whereas the phone camera sensor is only able to capture a quarter of the total brightness range at best. The detail that is not captured by the sensor will rapidly clip to white in the highlights and crush to black in the shadows. It’s important to note that incorrect handling of recorded footage can result in a loss of dynamic range too.
Let’s have a look at another example below. Notice the lack of trackable detail in the shadow portion of the image on the right.
How does this affect matchmoving?
Good contrast is important to matchmoving but not at the expense of detail. Put simply, it’s the difference between a few trackable features and many trackable features. While tracking a low dynamic range scene is far from impossible and potentially could still yield great results, having a feature-rich, high dynamic range scene can make your life a whole lot easier and get you closer to the results you desire quicker.
There are two types of sensor readout: global shutter, which reads the image data from the sensor all at the same time and rolling shutter, which reads each line of image data sequentially from top to bottom. The image skew in fast motion commonly seen on low-end cameras is caused by a slow rolling shutter readout time.
You can see the effects of rolling shutter for yourself using a mobile phone set to video mode. Point the camera towards a vertical surface like a door frame. Record with the phone held steady for a few seconds, then gradually pan left and right with the phone, slowly increasing the rate at which you pan. When you play back the footage, you will notice that the door frame tilts as you increase the pan speed, rather than being perfectly vertical as it should be. Below are some stills taken from the footage I recorded of a brick wall with my phone camera, demonstrating the issue.
Most cameras, especially consumer and semi-professional cameras, will suffer from rolling shutter, sometimes to quite a severe level.
In simple terms, rolling shutter is caused by the image being read off the sensor row by row and by the time it’s got to the bottom, the camera orientation has changed slightly. In effect, the top of the image is a slightly different point in time from the bottom. High-end cameras from Red and Arri do suffer from the effects of rolling shutter but reduce it dramatically by increasing the speed at which the image is read off the sensor.
How does this affect matchmoving?
Rolling shutter is movement where there should be no movement and this in turn leads to false results when we matchmove the footage. Rolling shutter is a complex problem to fix. Foreground elements skew to a greater degree than the background. However, advanced matchmoving software like The Pixel Farm’s PFTrack offers a solution to correct or minimise this.
When taking a photo in a dimly-lit environment using your camera phone, the pictures can look a bit noisy and lacking in fidelity. This is because the camera is gaining the signal by increasing the ISO in order to reach an adequate exposure level. Lower ISO values will generally mean lower noise levels while higher ISOs increase the noise levels. High-end cinema and stills cameras will perform a lot better in this regard than consumer-grade camera systems. They are not immune to excessive noise when using a high ISO but they are normally able to reach a higher ISO before noise becomes a limiting factor. Underexposure of footage can have the same effect as high ISO revealing more of the noise floor when correcting the image back to its proper exposure level.
In the example below we can see a crop from the shot exposed firstly at 800 ISO then at 3200 ISO. Notice how quickly fine details are obscured and are lacking in micro contrast as we increase through the range.
How does this affect matchmoving?
Noise can be a big problem during the matchmoving process, especially if tracking footage from cameras with smaller sensors in less than adequate lighting conditions. Fine details are lost due to interpolation errors in the debayering process, which we can see clearly in the 3200 ISO example above. Excessive noise can affect how tracking points are located (e.g. when auto-tracking) and how accurately they are tracked. However, a lot of noise has to be present in order for it to be a real problem when it comes to matchmoving.
Have you ever been streaming your favourite series when, all of a sudden, the internet connection dropped and you were left with a mess of blocks and squares making it difficult to even make out people’s faces? This is the result of compression.
A similar type of effect can happen during a shoot in situations where there are large amounts of camera movement and when using a highly compressed codec to record the footage. Most cameras will offer an option to record to a compressed codec to save space on memory cards when longer recording durations are required. Point of View (POV) cameras frequently use highly compressed codecs for recording.
Modern high-end broadcast codecs will deliver images almost indistinguishable from the uncompressed version. They do this by compressing the footage just enough so that it throws away information that we are not likely to need and maintains the bits that we do need. While the footage may look great when the camera is still, this might not be the case when it is moving.
In the example below of a handheld panning shot, I recorded to a highly compressed AVCHD @28Mbps / 3.5MB/s codec. I simultaneously recorded uncompressed with the same camera as a comparison. In the image on the right, notice how some of the fine details have completely disappeared with the compressed recording. Additionally, the edges have become unrefined and, when viewed in motion, appear to dance around and jitter.
How does this affect matchmoving?
Camera movement is everything in matchmoving and to give the software the best chance of finding an accurate solution we will want to give it the highest quality footage. Unfortunately, camera movement or any kind of movement is the worst enemy of compression.
This will present itself as mosquito noise around fine detail and macroblocking around areas of movement, as we have seen in the example above. Some video codecs group frames together, comparing each other and only storing and interpolating information that has changed between frames and averaging any detail that hasn’t. Matchmoving with compressed footage is still possible and will still provide adequate results but can take a lot longer due to errors created from false detail caused by interpolation and compression artefacts. In any situation, RAW data recording is always preferable to compression.
To find out about RAW recording can improve some of these issues, why not read my article on working with RAW here.
Spherical 360 video
360 video consists of a real-world video shot with a 360-degree camera that allows the viewer to change their viewing angle at any point during playback. These videos can be enhanced further with computer generated images (CGI) in the postproduction process in the same way we would a conventional 2D production. However, this does require some specialist matchmoving software and toolsets like The Pixel Farm’s PFTrack.
VR 360 cameras commonly involve two or more cameras recording at least HD. The clips from each of the cameras are then stitched together, either internally or in post, to form a 360-degree spherical panorama that can be viewed in a desktop viewer or VR headset.
The two main types of VR camera system are back to back rigs and multi camera rigs.
Back to back rigs are simply two optics and sensors in one housing or two separate cameras placed back to back with combined optics that cover 360 degrees. The benefits in these systems are low parallax, size, ease of use and small footprint making them perfect for situations where a larger 360 rig would not be practical. The downside is the somewhat limited resolution combined with the extreme nature of the optics can lead to aberrations and fairly soft results.
Multi camera rigs share many of the same principles as the back to back systems but add more cameras to ultimately achieve better quality results. These rigs can be made up of multiple cinema cameras or as a single housing with many integrated sensors and optics. The distinct advantage multi camera systems offer is due to there being a larger number of higher quality cameras. The optics don’t have to cover such an extreme angle of view, which makes them less susceptible to complex distortions, aberrations, flaring and softening towards the extreme edges. Clearer, higher-resolution images with greater dynamic range will always have the potential to provide better results during the matchmoving process.
Unique factors with 360 video
360 camera systems can run into the same issues we discussed above but also have a few unique problems.
Parallax is a common problem shared by both back-to-back and multi-camera systems. This presents itself as errors of overlapping detail along the stitch line, with objects closer to the camera rig worst affected. In order to achieve a perfect stitch line, all cameras must rotate around the entrance pupil of the optics. Unfortunately, this would be physically impossible as all cameras would have to occupy the same space at the same time. We can see the effect of parallax in the frame below where the wall is close enough to the rig for parallax to be an issue. This presents itself as misregistered detail on the wall along the stitch line.
The effects of parallax can be minimised by making sure the cameras are as close to the central axis plane as possible and the rig is not too close to the subject you wish to track. This is achieved very successfully in systems where both optics and sensors are built into the same unit. However, image quality compromises have to be made in order to shrink the cameras and sensors enough to do this. Parallax errors can be a problem as they can cause camera registration errors and create accuracy problems when positioning tracking points in 3D space.
Camera synchronisation is a big problem with some VR 360 camera rigs. During our testing, we used a back to back VR system comprising two separate cameras. Despite large amounts of experimentation, we struggled to get sufficient synchronisation with both front and rear cameras. While it was still possible to track the clip, we could never get a perfect sync between the stitched clips due to slight variances in the sensor timing. This ultimately led to errors in accuracy during the tracking process due to independent movement between cameras. In the example below we can see a 360 clip manually adjusted for correct sync on the left and the recorded incorrect sync seen along the stitch line on the right.
Larger single housing multi cam rigs and rigs made up of professional cinema cameras solve this problem using a locking signal and timecode to sync the clips together during recording, but they do on occasion still fall out of sync.