Choosing a GPU for PFClean
Traditionally in many applications, processing has taken place on the CPU as a series of discrete actions performed sequentially. With low core counts and clock speeds in older processors this led to bottlenecking in processing times, especially in tools that require multiple layers of motion estimation, detection and fixing.
PFClean was the first application commercially available in any market to use General Processing on the GPU. By offloading compute intensive tasks to the Graphics Processing Unit, PFClean is able to utilise the massively parallel architecture of the GPU to dramatically increase the speed of your restoration across a number of tools.
Following on from our guide to building a workstation for PFClean, this post discusses the aspects to consider when selecting a GPU for your workstation in order to boost performance in PFClean.
01. GP-GPU in PFClean
In PFClean you are able to employ a GPU for display and multiple GPUs for Batch processing and for processing over a network, to rapidly export your restored material. PFClean continues to be at the forefront of this technology with its Telerack video processing engine and Digital Wet Gate, which heavily uses the strengths of modern GPUs. Effects in the Workbench that require motion estimation such as Fix Frame and De Warp benefit massively and are significantly faster than multi core CPUs. GPU processing is used in most effects for rendering tasks with the advantages in speed being seen over CPUs especially when working with high resolution material such as 4K.
02. Card Interface
The GPU hardware interface or PCIe connection is how your GPU physically connects to the rest of the your system. Things to check are whether your motherboard is gen 2.0 PCIe or gen 3.0 PCIe; this is part of the board and one of the factors that dictates how quickly image data can be transferred on and off your GPU. Gen 3.0 allows for much greater speed and therefore much greater yields in processing image data. Gen 4.0 motherboards and GPUs are on the horizon and will allow for potentially much greater image processing performance.
- PCIe 1.x: 250 MB/s
- PCIe 2.x: 500 MB/s
- PCIe 3.0: 985 MB/s
- PCIe 4.0: 1969 MB/s
03. PCIe Lanes
CPUs have a number of channels which must be shared for communicating with other devices attached to the motherboard. These are known as PCIe lanes. Modern Xeon socket boards will have up to 40 of these lanes per CPU. Desktop processors such as the i7 series of CPUs normally have around 16 and use switching technology to connect more devices to these same 16 lanes thus reducing simultaneous processing performance where multiple devices are attached. Increasingly, modern GPUs will require a 16x socket to run at maximum performance; anything less than this and it will bottleneck, and may not even work correctly. This is why it’s important to check the designation of PCIe lanes on your motherboard sockets. Each slot will be labelled for either 1x 4x 8x 16x lanes. Some slots may be physically/electrically 16x but only wired for 4x so it’s worth consulting your motherboard manual to make sure the GPU is in the correct socket for your system.
The cards we have chosen for our system build require a 16x PCIe 3.0 socket per card. With the dual Xeon CPUs we are using we have up to 80 PCIe lanes available, which will provide more than enough bandwidth for any GPU configuration we choose.
The RAM functions as a fast-access, temporary location to hold your image data while it is waiting to be processed by the GPU cores. Common RAM types are GDDR3, GDDR5 and HBM. The speed and amount of your RAM play a big factor when processing images in PFClean, especially when handling large 4K files. GDDR5 is the most common form of RAM type and offers a good speed while being able to store a large amount of data in the case of cards such as the AMD W9100 with 16GB.
GPU memory bandwidth is a relationship between the clocking speed, bus width, and memory type. Common types of memory bus width are 128bit, 256bit and 512bit. Most mid to high-end workstation cards will have 256bit or 512bit bus widths and are usually well matched to the GPU’s processing cores. However you may find that in some cases low to mid-range consumer GPUs, while offering a healthy amount of RAM, are actually bottlenecked by slow clock speeds and a narrow memory bus. Assuming the GPU has sufficient processing power it will actually be left hanging around waiting for the data to be delivered before it can continue. 4GB is considered the minimum required for PFClean and 8GB for handling 4K content.
05. Consumer vs. Workstation Cards
Often consumer cards are supplied by multiple vendors who will deviate from the reference design, making it hard to optimise and guarantee the card’s performance. Workstation cards however are designed and produced to an exact specification and consistency with drivers designed for professional software tools. Workstation chips are highly binned meaning they are the very best chips selected from the silicon wafer.
Power draw is another key difference between workstation and consumer cards. Typically you will find an equivalent workstation card will draw much less power than its consumer counterpart. Less power draw means less strain on your system’s power supply, increasing its life and saving money in the long term.
With increased power consumption, heat will be a by-product. Consumer cards are designed thermally to run at short stints at full load, typically in a gaming environment. When used in a professional software environment where the card can be at full load for hours on end, it can potentially reduce the card’s lifespan and performance. The consumer card may perform well but at the cost of noise, heat, power and in the long term, stability. Workstation cards are designed to run cooler and reliably for extended periods of time.
Additionally workstation cards have the advantage of a memory capacity that is optimised for accurate high resolution workflows. Crossover cards such as AMD’s Radeon Pro Duo and Nvidia’s Titan which offer professional support and drivers are starting to become more popular and offer a desirable compromise.
06. Graphics Card Drivers
Due to the constantly changing nature of graphics card drivers, you should check back to the manufacturer’s website for updates. The Pixel Farm, while making every effort to check driver updates, cannot test every release. Drivers for Apple hardware are provided via the operating system.
07. Software Support
PFClean uses OpenCL and requires an OpenCL 1.2 compliant card to run. Due to the large and varying number of available graphics cards we can only assist with support requests for our software running on a limited number of GPUs which are kindly provided by the hardware vendors themselves for testing. Currently these are;
Quadro K4000 (and newer)
Quadro K5000 (and newer)
Quadro K6000 (and newer)
Tesla K10 (Batch Processing Only)
Tesla K20 (Batch Processing Only)
Tesla K40 (Batch Processing Only)
GTX Titan cards
Fire Pro W5100 (and newer)
Fire Pro W7100 (and newer)
Fire Pro W8100 (and newer)
Fire Pro W9100 (and newer)
Radeon Pro Duo
AMD Firepro S9100 (Batch Processing Only)
AMD Firepro S9150 (Batch Processing Only)
AMD Firepro S9170 (Batch Processing Only)
Radeon Pro Duo
AMD Radeon 395X 4GB
GeForce 755M GT