Building a Workstation for PFClean
Purchasing or upgrading a workstation for your restoration work can be both expensive and confusing. Manufacturers and machine builders often offer up a plethora of unnecessary and expensive options which won’t improve your productivity at all and are based on a very generalised performance goal. In this guide the aim is to show you how easy it is to configure a workstation or upgrade your existing system to maximise efficiency in PFClean, without spending a large amount of money.
For the purposes of illustrating the importance of choosing the appropriate components, the component choices have been broken down into sections including GPU, processor, chassis, drives and RAM and each section outlines the goals we want to achieve from each component choice.
– General Processing on The GPU
– What Do You Need to Boost Your Throughput?
– How Many Processors? How Many Cores? How Many GHz?
– The Choice of CPU
– Off The Shelf
– Too Many Choices
– From Firm Foundations
– PCIe Lanes
– How Fast and How Much Storage?
– What About Caching?
– Drive Choices
– Future Upgrades
General Processing on The GPU
PFClean was the first application commercially available in any market, to use a technology called General Processing on the GPU (GPGPU). By offloading compute intensive tasks to the Graphics Processing Unit, PFClean is able to utilise the massively parallel architecture of the GPU to increase the speed of your restoration dramatically. In the new version you are able to employ multiple GPUs for both Display and Batch processing, as well as multiple GPUs over a network to rapidly export your restored material. PFClean continues to be at the forefront of this technology with its Digital Wet Gate and Telerack video processing engine, which is designed from the ground up to be a GPU processing only engine.
What Do You Need to Boost Your Throughput?
PFClean uses Open CL and hence requires an an OpenCL 1.2 compliant card to run. For the purposes of this blog post, AMD’s Radeon Pro WX7100 was chosen; each card has 8GB of GDDR5 memory and are 4k display enabled. These cards are the sweet spot in AMD’s line of professional cards and offer great performance, thermal characteristics, power consumption and price.
To learn more about GPU processing please follow the link here to our guide on choosing a GPU for PFClean.
Please note – When using dual GPUs in this setup above, 4G processing must be enabled in the BIOS.
In terms of performance you would have to spend a large amount more for a dual socket card, for fairly modest performance improvements. As a rule for 4K content, it is recommended that a minimum of 8GB of VRAM per card is required for acceptable performance; the WX7100 with its 8GB of RAM fulfills this requirement although for 4K, AMD’s WX9100 with 16GB of RAM and greater memory bandwidth, might be a much better option.
Gaming cards typically are a non-reference design and are intended for short stints at full load, therefore stable performance over time cannot be guaranteed. In addition, they run hot and also have less RAM along with their drivers not being tested as thoroughly as their workstation siblings. In short, it could mean the difference between a large project rendering, or not.
How Many Processors? How Many Cores? How Many GHz?
It used to be the case that the processor with the most cores and the highest clock speed did the best job. Nowadays this is simply no longer the case, it has been observed that Intel has even started to segregate what it terms workstation processors and server processors as. With portions of the processing being farmed out to the GPU, the CPU is dedicated to management of background tasks rather than doing the heavy lifting itself. Although PFClean is massively multithreaded, dual 18 core Xeons won’t yield high performance gains, lower core counts with very high clock speeds and a large memory bandwidth on the other hand, will balance the GPU processing nicely. 6, 8 and 10 core CPU’s are considered to be the Ideal sweet spot for workstation purposes.
The Choice of CPU
It’s recommended that 8 physical processing cores are required for good performance in PFClean. We have chosen dual 6 core E5 v3 Xeon processors (12 cores total) with 2.4GHz base clock and 3.2GHz boost; they are a good balance between multi threaded performance, PCIe lanes and cost. Additionally their relatively low power draw and low heat output make them ideally suited to running for extended periods of time, without taxing either the power supply or the thermal management of the system. Performance increases can be had with dual 8 core Xeons, but the money is better spent on a more powerful GPU solution for your system and faster storage. The advantage of dual socket design is more PCIe lanes and thereby more expansion room for GPU’s, Large PCIe SSD cards and Thunderbolt I/O.
Off The Shelf
Can you use desktop originated CPUs with PFClean? The short answer is, Yes. The long answer is it’s not a good idea, as many lower end desktop CPUs such as Core i7’s and i5’s, won’t necessarily have the multi threaded performance or the memory bandwidth for high resolution restoration work. And when factoring in a dual GPU configuration and a PCIe SSD you will quickly find your PCIe lanes being saturated and overall system performance diminished. In addition, the desktop CPUs won’t have the level of stability and longevity which is expected from the professional line of Xeons. While many of the high end i7 processors work extremely well in practice, they can’t match the level of support offered by their Xeon counterparts.
Too Many Choices
PC Chassis is a loose term for a box to put all your bits in; there are myriad options and the term can take on different definitions from manufacturer to manufacturer. They mainly fall into two camps, pre-configured bare bones system that you just install your selected components into, and completely empty boxes that allow you to scratch build a system, leaving you to decide every single component in the system. Pre-configured systems with a motherboard and power-supply built in, offer a speedy and reliable, warranty supported way to customise a system to your needs, whilst eliminating some of the more fiddly time consuming, installation and management processes.
From Firm Foundations
The main goal here is to select a chassis with room to expand your PFClean workstation and maintain a level of future proofing. The Supermicro SYS-7038A-I is a pre-configured chassis with motherboard and power supply that offers the flexibility of being a modern dual socket board with enough space in the chassis and overhead in the power supply to run multiple GPUs/expansion cards, and support for up to 18 processing cores per chip. Additionally, it also has a relatively low entry cost and ease of build vs other comparable systems. I/O is another big factor and the system does offer USB 3.0 front headers and a TBT 2 Thunderbolt upgrade path in the form of a PCIe card in the PCH lane, giving you high transfer speeds when moving footage onto the system. Supermicro also has BIOS updates available to upgrade the socket to take Xeon e5 V4 processors; something that a domestic desktop motherboard in many cases will not be able to do. With this particular chassis it is possible to get your system up and running within a couple of hours.
Every CPU will have a maximum number of PCIe lanes it uses for sending packets of information backwards and forwards to your GPU and other PCIe devices. A modern Xeon socket such as the one we have chosen for our workstation build (Xeon E5-2620V3) will have up to 40 PCIe lanes at its disposal per CPU, providing more than adequate lanes for two high end GPUs and PCIe storage. Typically a dual GPU workstation with PCIe storage and network adapter will use up to 40 lanes, 20 per CPU on a Xeon dual socket system.
A high end consumer grade CPU such as the intel i7 6700K will only have a maximum of 16 PCIe lanes meaning that once a GPU such as the AMD w9100 is installed there are zero lanes left for other devices such as network adapters and storage, this means the card will actually run using 8 PCIe lanes, hampering performance. This is one area that demonstrates the need for a professional grade solution where PCIe lanes can become a serious bottleneck for restoration work.
Some manufacturers use PCIe switches via an extra chipset located on the motherboard to enable more PCIe lanes. Although this may allow for two 16 lane devices to be connected and used via a single 16 lane connection to the CPU, it will not provide full sustained throughput from both devices at the same time.
04. Storage Options
How Fast and How Much Storage?
When building a system for PFClean, it is important to keep in mind the most common formats you are working with and how much material you have. A 2K dpx feature length film will roughly take up 1-1.5TB of storage and will require a minimum of 350MB/s + overhead to playback real time, whereas 4K will be 4 times this amount in storage and speed. Other factors to consider are space for renders and a 30% overhead for throughput, so that you are not putting so much strain on the drives. As a rule, never put more storage into your system than you can safely and regularly backup.
What About Caching?
Caching is a big part of the PFClean workflow and involves storing large amounts of temporary processing data for reuse very quickly, the final output(export) is considered the render. What makes an SSD perfect for these operations vs a traditional HDD is that the access times are up to 100+ times quicker. We have chosen an Intel PCIe NVME 1.2TB SSD drive. Its high performance, low latency design makes it ideal for handling lots of small files, and with sequential writes at 1400 MB/s and read performance of over 2400 MB/s it will help when handling 4K restoration projects.
In the system we have 4 x Seagate 3TB SATA HDDs. This will give, in RAID 0 configuration, a theoretical storage capacity of 12TB, and an actual capacity of 10.8TB. Disk speeds will be in excess of 800MB/s read and 700MB/s write, which provides plenty of throughput for the 350MB/s required for 2K dpx playback. This would be ideal for two full-length feature films (120mins) including raw scans, cached files and renders with plenty of overhead. HDDs make great storage drives for your footage due to their high capacity and low cost, whereas to build the equivalent storage array using SSDs would cost thousands. It is possible to work with 4K on these drives with the assistance of the NVME SSD for caching and more RAM, but greater performance can be had by replacing these drives with SSDs. We have also included a professional grade Samsung boot drive for the OS, these drives have a proven track record and are recommended by most machine builders.
Although the system is versatile, there is a limit in physical space inside the chassis. One possible upgrade for 4K could be to install a 10Gb/s network card into the system and have a 10Gb/s storage server attached. This would have the potential for expansion into the 100s of TBs and have the benefit of enabling you to utilise PFClean’s batch rendering across a network.
We have chosen 32GB of DDR4 system RAM, which will provide each of the 12 processing cores with 2.6GB, and will easily cache shots at 2K of normal length. As a guide there should be at least 2GB per physical core. The board can take up to 2TB of ECC 3DS LRDIMM and would ideally be upgraded to absolute minimum of 64/128GB RAM for 4K restoration.
06. Video I/O
At the time of writing, PFClean supports the Blackmagic and AJA Kona video capture cards for tape capture. We have selected a Blackmagic DeckLink Studio 4K. This card offers a good balance between features and price, additionally it has full support for legacy analogue and modern SDI based formats. Live monitoring of the capture can be made by attaching a broadcast spec monitor to one of the SDI outputs or via the HDMI breakout to a TV or suitable monitor. Additional VTR equipment may be required to stabilize a weak video signal on some legacy formats such as VHS.
07. Bill of Materials
Chassis: Supermicro SYS-7038A-I (x1)
Boot Drive: Samsung SSD 80 EVO 500gb (x1)
12TB RAID: 3TB Seagate ST3000DM001 (x4)
Cache Disk: Intel DC P3500 1.2TB NVME AIC SSD (x1)
Video I/O: Blackmagic DeckLink Studio 4k (x1)