Virtualized Gaming: Nvidia Cards, Part 2: GeForce, Quadro, and GeForce Modified into a Quadro – Higher End Fermi Models

Following the success with QuadForce 2450 modification (GeForce GTS450 -> Quadro 2000), I went on to investigate whether the same modification will work on the GTX470 to turn it into a Quadro 5000 and on a GTX480 to turn it into a Quadro 6000. Modifying a GTX580 into a somewhat obscure Quadro 7000 was also undertaken.

GeForce GTX470448:56:405x1.25GB
GeForce GTX480480:60:486x1.50GB
Quadro 5000352:44:405x2.50GB
Quadro 6000448:56:486x6.00GB

In all three cases, the modifications were successful, and they all worked as expected – features like VGA passthrough work on the 5000 and 6000 models and gaming performance is excellent, as you would expect – I can play Crysis at 3840×2400 in a virtual machine. Again, the extra GL functions aren’t there (if you compare the output of glxinfo between a real Quadro and a QuadForce, you will find a number of GL primitives missing), so some aspects of OpenGL performance are still crippled. PhysX support is also a little hit-and-miss. In a VM, on Windows 7 it seems to work on Quadro cards; on XP it appears to not be working. On bare metal on Windows XP it works. This appears to be due to the Quadro driver itself, rather than due to the cards not being genuine Quadros.

Finally, the GF100 based cards (GTX470/480) also get an extra feature enabled by the modification – second DMA channel. Normally there is a unidirectional DMA channel between the host and the card. Following the modification, the second DMA channel in the other direction is activated. This has a relatively moderate impact on gaming performance, but it can have a very large impact on performance of I/O bound number crunching applications since it increases the memory bandwidth between the card and the system memory (you can read and write to/from the GPU memory at the same time). Compare the CUDA-Z Memory report for the GTX470 before and after modifying it into a Quadro 5000 – GTX470 only has a unidirectional async memory engine, but after modifying it the engine becomes bidirectional:

The same happens on the GTX480 – it’s async engine also becomes bidirectional after modification.

Quadro 7000 is a little different from the other two. It doesn’t have dual DMA channels, and Nvidia don’t list it as MultiOS capable. The drivers do not do the necessary adjustments to make it work with VGA passthrough. That means that, unfortunately, the gain from modifying a GTX580 is questionable in terms of what you will gain. Note, however, that the Quadro 7000 was never aimed at the virtualization market; it was only available as a part of the QuadroPlex 7000 product – an external GPU enclosure designed for driving multiple monitors for various visualisation work. Hence the lack of MultiOS support on it.

Here is how the QuadForce 5470 does in SPECviewperf (GTX470 = 100%):

Compared to the QuadForce 2450, the performance improvements are more modest – the only real difference is observable in the lightwave benchmark.

Unfortunately, my QuadForce 6480 is currently in use, so I cannot get measurements from it, but since the they are both based on the GF100 GPU, the results are expected to be very similar.

On the QuadForce 7580 there was no observed SPEC performance improvement.

I have since acquired a Kepler Based 4GB GTX680 and successfully modified it into Quadro K5000. Modifying it into a Grid K2 also works, but there don’t appear to be any obvious advantages from doing so at the moment (K5000 works fine for virtualization passthrough, even though it wasn’t listed as MultiOS last time I checked). This QuadForce K5680 is why my GTX470 became free for testing again. More on Quadrifying Keplers in the next article. I also have a GTX690 now (essentially two 680s on the same PCB), which will be replacing the QuadForce 6480, so this will also be written up in due time. Unfortunately, however, quadrifying Keplers in most cases requires some hardware as well as BIOS modifications. I will post more on all this soon, along with a tutorial on soft-modding.

Virtualized Gaming: Nvidia Cards, Part 1: GeForce, Quadro, and GeForce Modified into a Quadro

Recently I built a new system with the primary intention of running Linux the vast majority of the time and never having to stop what I am doing to reboot into Windows every time I wanted to play a game. That meant gaming in a VM, which in turn meant VGA passthrough. I am an Enterprise Linux 6 user, and Fedora is too bleeding edge for me. What I really wanted to run is KVM virtualization, but the support for VGA passthrough didn’t seem to work for me with EL6 packages, even after a selective update to much newer kernel, qemu and libvirt related packages. VMware ESX won’t work with PCI passthrough on my EVGA SR-2 motherboard because EVGA, in their infinite wisdom, decided to put all the PCIe slots behind Nvida NF200 routers/bridges which don’t support PCIe ACS functionality, which ESX requires for PCI passthrough. That left me with Xen as the only remaining option. I now mostly have Xen working the way I want – not without issues, but I will cover virtualized gaming and Xen details in another article. For now, what matters is that Xen VGA passthrough currently only works with ATI cards and Nvidia Quadro (but not GeForce) cards.

ATI cards are not an option for me due to various driver bugs (e.g. handling monitors on which refresh rate is dependant on resolution due to bandwidth limitations), lack of features (no option to use anything but EDID modes, to the extent of completely ignoring monitor driver .inf files; the custom mode feature used to exist in the drivers (the documentation for it can still be found on the AMD website) but has been removed at some point) and most importantly, lack of multiple DL-DVI outputs on cards more recent than the Radeon HD4xxx series (Radeon HD5xxx and later cards only come with a single DL-DVI port – on those that come with a second DVI port, even though it physically looks like a DL, it only provides a single link).

Nvidia GeForce cards don’t work in a virtual machine, at least not without unmaintained patches that don’t work with all cards and guest operating systems.

That leaves Nvidia Quadro cards. Unfortunately, those are eyewateringly expensive. But, on paper, the spec lists the same GPUs used on GeForce and Quadro cards. This got me looking into what makes a Quadro a Quadro and a few days of research and a weekend of experimentation yielded some interesting and very useful results. While it looks like some features such as certain GL functions are disabled in the chips (probably by laser cutting), some features are purely down to the driver deciding whether to enable them or not. It turns out, making cards work in a VM is one of the driver-depentant features.

Phase 1: Verify That Quadros Cards Work in a VM When GeForce Don’t

Looking at the specification and feature list of Quadro cards, Quadro 2000, 4000, 5000 and 6000 models support the “MultiOS” feature, which is what Nvidia calls VGA passthrough. So, the first thing I did was acquire a “cheap” second hand quadro Quadro 2000 on eBay. Cheap here being a relative term because a second hand Quadro costs between 3 and 8 times the amount the equivalent (and usually higher specification) GeForce card costs. The Quadro card proved to work flawlessly, but the Quadro 2000 is based on a GF106 chip with only 192 shaders, so gaming performance was unusable at 3840×2400 (I will let go of my T221 monitors when they are pried out of my cold, dead fingers). Gaming at 1920×1200 was just about bearable with some detail level reductions, but even so it was borderline.

Here is how the genuine Quadro 2000 shows up in GPU-Z and CUDA-Z:

And here are the genuine Quadro 2000 SPECviewperf11 results:


Phase 2: Get an Equivalent GeForce Card and Investigate What Makes a Quadro a Quadro

The next item on the acquisition list was a GeForce GTS450 card. On paper the spec for a GTS450 is identical to a Quadro 2000:
192 shaders
1GB of GDDR5
Note: There are some models that are different despite also being called GTS450. Specifically, there is an OEM model that only has 144 shaders, and there is a model with 192 shaders but with GDDR3 memory rather than GDDR5. The DDR3 model may be more difficult to modify due to various differences, and the 144 shader model may not work properly as a Quadro 2000.

Armed with the information I dug out, I set out to modify the GTS450 into a QuadForce (a splice between a Quadro and a GeForce – and Gedro just doesn’t sound right). This was successful, and the card now detected as a Quadro 2000, and everything seemed to work accordingly. The VGA passthrough worked, and since the GTS450 is clocked significantly higher than the Quadro 2000, the gaming performance was improved to the point where 1920×1200 performance was quite livable with. What didn’t improve to Quadro levels is OpenGL performance of certain functions that appear to have been disabled on the GeForce GPUs. Consequently, SPECviewperf11 results are much lower than on a real Quadro 2000 card, but the GeForce GTS450 scores higher on every gaming test since games don’t use the missing functionality, and the GeForce card is clocked higher. It is unclear at the moment whether the extra GL functionality was disabled on the GPU die by laser cutting or whether it is disabled externally to the GPU, e.g. by different hardware strapping or pin shorting via the PCB components – more research into this will need to be done by someone more interested in those features than me. Since the stamped-on GPU markings are different between the GTS450 (GF106-250, checked against 3 completely different GDDR5 GTS450 cards) and the Quadro 2000 (GF106-875 on the one I have), it seems likely the extra GL functionality is laser cut out of the GPU.

Here is how the GTS450 modified to Quadro 2000 shows up in GPU-Z and CUDA-Z:

CUDA-Z performance seems to scale with the clock speeds, so the faux-Quadro card wins.

Here are the SPECviewperf11 results for a GTS450 before and after modifying it into a Quadro 2000. As you can see, in this test those missing GL functions make a huge difference, but in some tests there is still a substantial improvement:



QuadForce 2450:


Here is the data in chart form (relative performance, real Quadro 2000 = 100%).

As you can see the real Quadro dominates in all tests except ensignt-04 where it gets soundly beaten by the GeForce card. Modification does seem to improve some aspects of performance. In particular, Maya results seem to improve by a whopping 44% following the modification.

If you are only interested in support and VGA passthrough for virtual machines, modifying a GeForce card to a Quadro can be an extremely cost effective solution (especially if your budget wouldn’t stretch to a real Quadro card anyway). If you are only interested in performance of the kind measured by SPECviewperf, then depending on the applications you use, a real Quadro is still a better option in most cases.

WQUXGA – IBM T221 3840×2400 204dpi Monitor – Part 5: When You Are Really Stuck With a SL-DVI

I recently had to make one of these beasts work bearably well with only a single SL-DVI cable. This was dictated by the fact that I needed to get it working on a graphics card with only a single DVI output, and my 2xDL-DVI -> 2xLFH-60 adapter was already in use. As I mentioned previously, I found the standard 1xSL-DVI’s worth 13Hz to be just too slow when it comes to a refresh rate (I could see the mouse pointer skipping along the screen), but the default 20Hz from 2xSL-DVI was just fine for practically any purpose.

So, faced with the need to run with just a single SL-DVI port, it was time to see if a bit of tweaking could be applied to reduce the blanking periods and squeeze a few more FPS out of the monitor. In the end, 17.1Hz turned out to be the limit of what could be achieved. And it turns out, this is sufficient for the mouse skipping to go away and make the monitor reasonably pleasant to use.

(Note: My wife disagrees – she claims she can see the mouse skipping at 17.1Hz. OTOH, she is unable to read my normal font size (MiscFixed 8-point) on this monitor at full resolution. So how you get along with this setup will largely depend on whether your eyes’ sensitivity is skewed toward high pixel density or high frame rates.)

The xorg.conf I used is here:

Section "Monitor"
  Identifier    "DVI-0"
  HorizSync    31.00 - 105.00
  VertRefresh    12.00 - 60.00
  Modeline "[email protected]"  165.00  3840 3848 3880 4008  2400 2402 2404 2406 +hsync +vsync

Section "Device"
  Identifier    "ATI"
  Driver        "radeon"

Section "Screen"
  Identifier    "Default Screen"
  Device        "ATI"
  Monitor        "DVI-0"
  DefaultDepth    24
  SubSection "Display"
    Modes    "[email protected]"

The Modeline could easily be used to create an equivalent setting in Windows using PowerStrip or a similar tool, or you could hand-craft a custom monitor .inf file.

In the process of this, however, I have discovered a major limitation of some of the Xorg drivers. Generic frame buffer (fbdev) and VESA (vesa) drivers do not support Modelines, and will in fact ignore them. ATI’s binary driver (fglrx) also doesn’t support modelines. Linux CCC application mentions a section for custom resolutions, but there is no such section in the program. So if you want to use a monitor in any mode other than what it’s EDID reports, you cannot use any of these drivers. This is a hugely frustrating limitation. In the case of fbdev driver, it is reasonably forgiveable because it relies on whatever modes the kernel frame buffer exposes. In the case of the VESA driver it is understandable that it only supports standard VESA modes. But ATI’s official binary driver lacking this feature is quite difficult to forgive – it has clearly be dumbed down too far.