Rendering with a Graphics Processing Unit (GPU), instead of (or alongside) a Central Processing Unit (CPU) is nothing new — we’ve been covering it in DEVELOP3D since 2009 — but it has suddenly become much more relevant to product development and CAD. So much so, that it could change the way you approach your next workstation purchase.
In 2016 alone we have seen GPU rendering technologies appear in SolidWorks, Rhino, and Siemens NX.
Previously, much of the activity was in the Digital Content Creation (DCC) sector with applications like 3ds Max and Maya.
For CAD, GPU rendering is all about ease of use. The push button approach, championed by CPU-based renderers like Luxion KeyShot, means engineers and designers don’t have to be experts in rendering in order to produce decent images.
The rendering tools are ‘physicallybased’, which means they are designed to simulate the real physical interactions between light and materials.
Rays of light are traced within a scene, which is very computationally intensive, but, as the rays are not dependent on each other, the process is extremely well suited to parallel compute architectures. And, with 1,000s of cores, GPUs are certainly that.
CUDA and OpenCL
Before we get into the details of GPU hardware it is important to understand the two underlying technologies that make rendering on a GPU possible. These are CUDA and OpenCL.
Most GPU renderers support either CUDA or OpenCL, but some support both.
CUDA is a proprietary technology from GPU manufacturer Nvidia and is designed primarily to work with Nvidia GPUs.
Some CUDA-based renderers can also be accelerated by CPUs, but the performance is usually nowhere near as fast. CUDA does not run on AMD GPUs.
GPU renderers that are compatible with CUDA include Nvidia Iray, Lightwork Design Iray+ and Chaos Group V-Ray RT.
CUDA has the broadest application support, with renderers available for Siemens NX, Rhino, 3ds Max, modo, Maya, SketchUp, Revit and Cinema 4D (as well as SolidWorks and other CAD apps indirectly through the standalone GPU renderer, SolidWorks Visualize).
OpenCL is an open standard from the Khronos Group, a non-profit organisation whose members include AMD, Nvidia, Intel, Apple, ARM and others.
It can execute on all types of GPUs (Intel, AMD and Nvidia), CPUs and other processors but OpenCLbased renderers tend to perform best on GPUs. AMD is a big champion of the technology.
Applications that support OpenCL renderers include Rhino, SolidWorks, 3ds Max, Revit, Maya, SketchUp and modo.
Depending on your hardware, a physically-based renderer can take tens of minutes, often hours, to render a high-quality image.
With a CPU renderer, in order to reduce render times, you simply buy a CPU with more cores. However, if all of the CPU cores are used then the workstation can become sluggish, making it very hard to do any meaningful work until the render is finished.
Applications like Luxion KeyShot circumvent this by allowing the user to specify how many CPU cores the render should use, leaving some CPU cores free for other tasks, such as CAD modelling.
With GPU rendering, users don’t have granular control over what percentage of a GPU’s resources are used. However, some GPUs are architected with asynchronous compute engines, which allow both compute and graphics tasks to be performed at the same time. This means the GPU could be crunching through a ray trace render but still respond almost instantly when you start to spin a 3D CAD model in the viewport.
With GPUs that support asynchronous compute the big advantage is that you can use all of your workstation’s GPU resources all of the time, regardless of whether it’s for a graphics or compute task.
AMD was a pioneer of asynchronous compute and its AMD FirePro and AMD Radeon Pro GPUs are designed specifically to handle compute tasks and graphics tasks concurrently. Most importantly, AMD GPUs can switch dynamically between them.
We tested this with an AMD FirePro W9100 by setting it to work on a V-Ray RT render. As it crunched its way through the ray tracing we loaded up a large assembly in SolidWorks, turned on RealView and started moving the 3D model around on screen.
Impressively, the model responded instantly, and could be rotated very smoothly. When we measured the frame rates, they were only a fraction slower than when all of the GPU’s resources had been dedicated to interactive 3D graphics.
Nvidia’s new ‘Pascal’ Quadro GPUs, which are due to ship in October, will have a similar technology. ‘Async Compute / Dynamic load balancing’ will feature in the Quadro P5000 and Quadro P6000 and is said to deliver more efficient sharing of resources between graphics and compute tasks.
This technology, however, is not available with current generation Nvidia Quadro GPUs, which includes Nvidia ‘Kepler’ and Nvidia ‘Maxwell’ GPUs, whose model numbers begin with a ‘K’ or an ‘M’.
With Kepler and Maxwell, if the GPU is working on a compute task, there would likely be a conflict if you sent it a graphics task at the same time. So, if you were in the middle of a GPU rendering and you suddenly wanted to reposition your 3D CAD model, the system would likely be sluggish, making it hard to orient your model quickly and accurately on screen.
To get round this limitation of the Kepler and Maxwell architectures, it is advisable to have one GPU dedicated to interactive graphics and one or more GPUs dedicated to GPU rendering.
However, the downside of this approach is it means you are not making the most out of your workstation’s GPU resources, as the GPU tasked with interactive graphics will sit idle when you are not moving your CAD model in the viewport.
Regardless of your GPU’s capabilities, if you want to cut rendering times you simply add more GPUs to your workstation.
A typical CAD workstation with one CPU can host up to two high-end GPUs on its PCIe x 16 slots, whereas a dual CPU workstation can host three or four.
So how do you choose which GPU(s) will be best for you? To get a rough idea of relative performance, check out the single precision numbers, rated in TFLOPs (one trillion floating-point operations per second). The bigger, the better.
However, this is by no means gospel and you should ideally seek out benchmark figures from your renderer of choice (Iray, V-Ray RT or AMD Radeon ProRender). Better still, test with your own datasets.
GPU memory is another important consideration, both in terms of capacity (for storing geometry and textures) and memory bandwidth.
4GB or 8GB should be considered a minimum and the bigger the bandwidth, the quicker the data can be fed to the GPU.
GPUs are just starting to feature High Bandwidth Memory (HBM). The AMD Radeon Pro Duo, for example, boasts speeds up to 1,024 GB/s, three times faster than the best performing GPU with GDDR5 memory.
Blowing your budget on an ultra highend GPU isn’t always the best route to go.
Depending on how quickly you want your renders back, you may get more for your money from two mid- to high-end GPUs.
Finally, here’s some practical advice on adding GPUs to your existing workstation.
GPUs come on single height or dual height PCIe boards that slot into a workstation’s PCIe x16 slots on the motherboard. Dual height cards tend to be longer in length, so check they will physically fit inside your machine.
High-end GPUs can draw an incredible amount of power, as much as 350W, so you will also need to check that the Power Supply Unit (PSU) in your workstation can cope.
Cards rated at over 100W will also need to draw additional power direct from the PSU, via one or more 6-pin or 8-pin cables.
Most designers will choose to render locally on a workstation, but network rendering using GPUs is also possible.
This could be an attractive option for design teams who want a powerful shared resource, or for a CAD user who only has a mobile workstation with an entry-level GPU.
Most of the developments in this space are being done by Nvidia.
Nvidia Iray Server provides distributed Iray rendering across networked machines. This could be an ad hoc network of workstations with Quadro GPUs or a dedicated rack of GPU servers with Nvidia Quadro or Nvidia Tesla GPUs (Tesla is a specialist GPU designed specifically for compute).
To keep an eye on progress, Iray Server can also stream back the rendered image live to the desktop or mobile workstation that submitted the job.
For an out of the box solution, Nvidia has a dedicated network render appliance called the Nvidia Quadro VCA, which features eight high-end Quadro GPUs. Multiple appliances can be used in tandem to deliver ray traced images in a matter of seconds.
Later this year Nvidia will add Iray support to its Nvidia DGX-1, a supercomputer powered by multiple ‘Pascal’ Tesla P100 GPUs.
Chaos Group is also working on V-Ray Swarm, a new web-based distributed rendering system that can work with GPUs.
For those that want to dramatically cut rendering times but don’t want a big capital investment, GPU rendering can also be done in the cloud on a pay per use basis.
Migenius is one of a handful of service providers operating in this space.
GPU rendering has been threatening to take off for some years now but limited support for CAD applications has been a major barrier to widespread adoption.
Now with GPU renderers suddenly coming online for Siemens NX, SolidWorks and Rhino the foundations have been laid for greater market penetration.
Most importantly, with both Nvidia and AMD competing head on, this can only be good news for end users. GPU rendering software is not only cheap (or free) but, with the price of GPUs falling, you can now add an incredible amount of processing power to your CAD workstation without breaking the bank.
AMD professional graphics cards
All current AMD GPUs feature asynchronous compute technology, which means they are adept at handling graphics and compute tasks at the same time (see main article for more information).
This means AMD offers the most flexibility when it comes to the way in which GPUs are deployed inside a workstation and, as a result, the lowest cost of entry into the world of GPU rendering.
Users can have a single GPU for both interactive graphics and rendering, or scale up to two, three or four GPUs depending on their needs and the capabilities of their workstation.
AMD is currently undergoing a big transition as it re-brands its professional line of GPUs from AMD FirePro to AMD Radeon Pro.
Last month the company announced the AMD Radeon Pro WX Series of professional GPUs, which are due for release later this year.
All three cards are very much focused on 3D CAD and are single height GPUs. We expect AMD to flesh out its AMD Radeon Pro WX Series later this year or next with one or two higher end cards.
The AMD Radeon Pro WX 4100 (4GB) is probably not powerful enough to be considered for GPU rendering, but the AMD Radeon Pro WX 5100 (8GB), which is rated at more than 4 teraflops of single-precision performance, should be an excellent entry-level card for V-Ray RT or AMD Radeon ProRender.
With over 5 teraflops of single-precision performance the AMD Radeon Pro WX 7100 (8GB) has similar compute numbers to AMD’s previous generation flagship card the AMD FirePro W9100.
However, with an estimated sub $1,000 price tag it will cost significantly less.
The AMD Radeon Pro Duo is from a slightly different family of GPUs but looks like the card to beat for those that take their GPU rendering seriously. With two on-board GPUs and up to 16.3 TFLOPs of single precision compute power, this water cooled beast of a card looks like a steal at £999 (ex VAT).
CAD users who have already invested in AMD FirePro GPUs can explore V-Ray RT or AMD Radeon ProRender with their existing card.
Rated at 1.43 TFLOPs the AMD FirePro W5100 should really be considered an absolute entry point to GPU rendering, with the FirePro W7100 offering a better option at 3.30 TFLOPs.
Finally, AMD has a very interesting technology that is designed to give a laptop or mobile workstation with an entry-level mobile GPU direct access to an extremely powerful desktop GPU.
With AMD XConnect technology users place a high-end GPU in a specially designed external GPU enclosure, then connect to it over Thunderbolt 3.
GPU rendering applications
It will come as no surprise that most of the recent developments in GPU rendering are being driven by the GPU manufacturers.
After all, what better reason is there to sell CAD users lots of powerful GPUs?
Nvidia has owned Iray for a number of years but it is only now making a big play for the CAD market.
The recently released Nvidia Iray for Rhino plug-in works with Rhino 5 and allows users to render directly within the Rhino Perspective viewport.
Users can add materials to the CAD model or move its orientation inside the viewport and the render will restart automatically giving continual feedback.
The software offers control over which processing resources are used, including CPU or specific GPUs. Network rendering is also available through Iray Server.
The software is available through the Iray store and costs $295 per year.
Iray plug-ins are also available for Autodesk 3ds Max, Maya and Cinema4D.
Nvidia also played an important role in the development of SolidWorks Visualize, a standalone renderer that works with SolidWorks models as well as models from other CAD tools, including Rhino Solid Edge, Inventor, Catia, Siemens NX and PTC Creo.
The product evolved from Bunkspeed, one of the first commercial GPU renderers, so it has a mature feature set. Users have full control over the rendering resources inside the workstation.
There are two versions of SolidWorks Visualize. ‘Standard’ comes free with SolidWorks Professional or Premium, whereas the ‘Professional’ version costs extra, adding support for animation, render queues and network rendering.
For a full review click here.
Iray is also embedded in CATIA Live rendering. Nvidia also has close links to Lightwork Design, a specialist in embedding rendering technology inside CAD applications.
The UK company was behind the implementation of a customised version of Iray into Siemens NX.
Iray+ is now in Siemens NX Ray Traced Studio (requires an NX Render licence) and Siemens NX Advanced Studio (requires an NX Studio Visualize licence).
At the end of July AMD officially launched Radeon ProRender (previously called FireRender). Beta versions of the physically-based renderer are now available for SolidWorks and Rhino.
The big news with AMD Radeon ProRender is that it is free. This has already got the attention of SolidWorks Standard users, who do not have access to the ray trace renderers, PhotoView 360 or SolidWorks Visualize.
The current beta version of Radeon ProRender for SolidWorks is very much a push button tool. The software automatically takes lighting and cameras from the scene, then maps SolidWorks materials to ProRender materials.
The render windows monitors changes in the SolidWorks windows and updates accordingly.
Users currently have little control over the process, except for output resolution and whether to render on the CPU or GPU. If GPU is selected, the software uses all GPUs within the workstation. AMD says more control and features will come over time but it is keen to emphasise that its most important feature is ease of use.
Radeon ProRender for Rhino is a more integrated product than the SolidWorks version and is available directly inside the Rhino viewport. AMD worked very closely with Rhino developer McNeel during its development, who was also heavily involved in the conversion of materials.
The plug-in is available for Rhino 6, which is currently in beta.
As Radeon ProRender uses OpenCL, it should work with Nvidia GPUs but it is unlikely that Nvidia will optimise its drivers to boost performance.
Chaos Group, the maker of V-Ray, is a pioneer of GPU rendering, with its first developments starting in 2008. The company’s GPU renderer, V-Ray RT, is now available alongside its CPUbased renderer.
Plug-ins are available for Rhino, Modo and 3ds Max as well as the AEC-focused CAD tools Revit and SketchUp.
V-Ray RT stands out from the other GPU renderers because it can support both CUDA and OpenCL. This means users can get the most out of both Nvidia and AMD GPUs.
Finally, it is worth mentioning that some Iray-based renderers (and Chaos Group V-Ray) support MDL materials, which can be shared between applications.
This could be important if you use multiple applications throughout your product development workflow.
Nvidia Quadro graphics cards
In October this year Nvidia will ship its first ‘Pascal’– based professional GPUs, the Quadro P5000 (16GB) and Quadro P6000 (24GB).
With single precision performance of 8.9 TFLOPs and 12 TFLOPs respectively, both models will be of most interest to those who take their GPU rendering seriously, but will likely come with a sizeable price tag.
Compared to the Quadro M5000 (8GB, 4.2 TFLOPs) and Quadro M6000 (12GB, 7 TFLOPs) which they will replace, the new Pascal GPUs will not only offer better performance and more memory but will be first Quadros to include ‘Async Compute / Dynamic load balancing’ for better sharing of resources between graphics and compute tasks.
This is an important development for Nvidia as it should allow designers to make much better use of their GPU resources.
Instead of having to invest in two GPUs and dedicate one to interactive graphics and one to GPU rendering (as is the case with the Maxwell and Kepler Quadros), one Pascal GPU should be able to handle both tasks at the same time (see body of text for more info).
This gives those looking to invest in Nvidia hardware for GPU rendering a bit of a dilemma. While the Quadro P5000 and P6000 will be available soon, we are unlikely to see Pascal-based replacements for the mid-range Maxwell Quadro M2000 (4GB, 1.8 TFLOPs) and Quadro M4000 (8GB, 2.5 TFLOPs) until well into 2017.
The comparative performance of Nvidia’s current generation Quadro GPUs makes for some interesting reading. We tested with SolidWorks Visualize running on a Scan 3XS Nanu Ultimate 2D Plus workstation (click here for more info).
Our conclusion is that the Quadro M4000 is the Nvidia card of choice for mainstream GPU rendering — both in terms of raw compute power and price / performance.
This could be coupled with a Quadro M2000 (for interactive graphics) or a second Quadro M4000, depending on your needs, but if your budget stretches and the Quadro P5000 delivers on its expected performance, a single GPU solution may end up being the better option.
Finally, those who already own a Quadro K2200 will be interested to learn that the Kepler-based GPU offers around the same Iray performance as its successor, the Quadro M2000.
However, the superior interactive graphics performance of the Quadro M2000 still makes it a better choice for entry-level workflows.
How a new wave of tools could change the way you buy workstation hardware