Virtualized Gaming: Nvidia Cards, Part 1: GeForce, Quadro, and GeForce Modified into a Quadro

Recently I built a new system with the primary intention of running Linux the vast majority of the time and never having to stop what I am doing to reboot into Windows every time I wanted to play a game. That meant gaming in a VM, which in turn meant VGA passthrough. I am an Enterprise Linux 6 user, and Fedora is too bleeding edge for me. What I really wanted to run is KVM virtualization, but the support for VGA passthrough didn’t seem to work for me with EL6 packages, even after a selective update to much newer kernel, qemu and libvirt related packages. VMware ESX won’t work with PCI passthrough on my EVGA SR-2 motherboard because EVGA, in their infinite wisdom, decided to put all the PCIe slots behind Nvida NF200 routers/bridges which don’t support PCIe ACS functionality, which ESX requires for PCI passthrough. That left me with Xen as the only remaining option. I now mostly have Xen working the way I want – not without issues, but I will cover virtualized gaming and Xen details in another article. For now, what matters is that Xen VGA passthrough currently only works with ATI cards and Nvidia Quadro (but not GeForce) cards.

ATI cards are not an option for me due to various driver bugs (e.g. handling monitors on which refresh rate is dependant on resolution due to bandwidth limitations), lack of features (no option to use anything but EDID modes, to the extent of completely ignoring monitor driver .inf files; the custom mode feature used to exist in the drivers (the documentation for it can still be found on the AMD website) but has been removed at some point) and most importantly, lack of multiple DL-DVI outputs on cards more recent than the Radeon HD4xxx series (Radeon HD5xxx and later cards only come with a single DL-DVI port – on those that come with a second DVI port, even though it physically looks like a DL, it only provides a single link).

Nvidia GeForce cards don’t work in a virtual machine, at least not without unmaintained patches that don’t work with all cards and guest operating systems.

That leaves Nvidia Quadro cards. Unfortunately, those are eyewateringly expensive. But, on paper, the spec lists the same GPUs used on GeForce and Quadro cards. This got me looking into what makes a Quadro a Quadro and a few days of research and a weekend of experimentation yielded some interesting and very useful results. While it looks like some features such as certain GL functions are disabled in the chips (probably by laser cutting), some features are purely down to the driver deciding whether to enable them or not. It turns out, making cards work in a VM is one of the driver-depentant features.

Phase 1: Verify That Quadros Cards Work in a VM When GeForce Don’t

Looking at the specification and feature list of Quadro cards, Quadro 2000, 4000, 5000 and 6000 models support the “MultiOS” feature, which is what Nvidia calls VGA passthrough. So, the first thing I did was acquire a “cheap” second hand quadro Quadro 2000 on eBay. Cheap here being a relative term because a second hand Quadro costs between 3 and 8 times the amount the equivalent (and usually higher specification) GeForce card costs. The Quadro card proved to work flawlessly, but the Quadro 2000 is based on a GF106 chip with only 192 shaders, so gaming performance was unusable at 3840×2400 (I will let go of my T221 monitors when they are pried out of my cold, dead fingers). Gaming at 1920×1200 was just about bearable with some detail level reductions, but even so it was borderline.

Here is how the genuine Quadro 2000 shows up in GPU-Z and CUDA-Z:

And here are the genuine Quadro 2000 SPECviewperf11 results:

VIEWSETCOMPOSITE
catia-0323.86
ensight-0416.63
lightwave-0143.12
maya-0336.25
proe-057.07
sw-0232.21
tcvis-0218.82
snx-0117.50

Phase 2: Get an Equivalent GeForce Card and Investigate What Makes a Quadro a Quadro

The next item on the acquisition list was a GeForce GTS450 card. On paper the spec for a GTS450 is identical to a Quadro 2000:
GF106 GPU
192 shaders
1GB of GDDR5
Note: There are some models that are different despite also being called GTS450. Specifically, there is an OEM model that only has 144 shaders, and there is a model with 192 shaders but with GDDR3 memory rather than GDDR5. The DDR3 model may be more difficult to modify due to various differences, and the 144 shader model may not work properly as a Quadro 2000.

Armed with the information I dug out, I set out to modify the GTS450 into a QuadForce (a splice between a Quadro and a GeForce – and Gedro just doesn’t sound right). This was successful, and the card now detected as a Quadro 2000, and everything seemed to work accordingly. The VGA passthrough worked, and since the GTS450 is clocked significantly higher than the Quadro 2000, the gaming performance was improved to the point where 1920×1200 performance was quite livable with. What didn’t improve to Quadro levels is OpenGL performance of certain functions that appear to have been disabled on the GeForce GPUs. Consequently, SPECviewperf11 results are much lower than on a real Quadro 2000 card, but the GeForce GTS450 scores higher on every gaming test since games don’t use the missing functionality, and the GeForce card is clocked higher. It is unclear at the moment whether the extra GL functionality was disabled on the GPU die by laser cutting or whether it is disabled externally to the GPU, e.g. by different hardware strapping or pin shorting via the PCB components – more research into this will need to be done by someone more interested in those features than me. Since the stamped-on GPU markings are different between the GTS450 (GF106-250, checked against 3 completely different GDDR5 GTS450 cards) and the Quadro 2000 (GF106-875 on the one I have), it seems likely the extra GL functionality is laser cut out of the GPU.

Here is how the GTS450 modified to Quadro 2000 shows up in GPU-Z and CUDA-Z:

CUDA-Z performance seems to scale with the clock speeds, so the faux-Quadro card wins.

Here are the SPECviewperf11 results for a GTS450 before and after modifying it into a Quadro 2000. As you can see, in this test those missing GL functions make a huge difference, but in some tests there is still a substantial improvement:

GTS450:

VIEWSETCOMPOSITE
catia-033.33
ensight-0420.67
lightwave-0110.80
maya-035.38
proe-050.36
sw-026.75
tcvis-020.35
snx-012.37

QuadForce 2450:

VIEWSETCOMPOSITE
catia-033.24
ensight-0417.83
lightwave-0110.72
maya-037.75
proe-050.37
sw-026.87
tcvis-020.35
snx-012.35

Here is the data in chart form (relative performance, real Quadro 2000 = 100%).

As you can see the real Quadro dominates in all tests except ensignt-04 where it gets soundly beaten by the GeForce card. Modification does seem to improve some aspects of performance. In particular, Maya results seem to improve by a whopping 44% following the modification.

If you are only interested in support and VGA passthrough for virtual machines, modifying a GeForce card to a Quadro can be an extremely cost effective solution (especially if your budget wouldn’t stretch to a real Quadro card anyway). If you are only interested in performance of the kind measured by SPECviewperf, then depending on the applications you use, a real Quadro is still a better option in most cases.

Note: I am selling one of my Quadrified GTS450 cards. I bought several fully expecting to brick a few in the process of attempting to modify them, but the success rate was 100% so I now have more of them than I need.

WQUXGA – IBM T221 3840×2400 204dpi Monitor – Part 5: When You Are Really Stuck With a SL-DVI

I recently had to make one of these beasts work bearably well with only a single SL-DVI cable. This was dictated by the fact that I needed to get it working on a graphics card with only a single DVI output, and my 2xDL-DVI -> 2xLFH-60 adapter was already in use. As I mentioned previously, I found the standard 1xSL-DVI’s worth 13Hz to be just too slow when it comes to a refresh rate (I could see the mouse pointer skipping along the screen), but the default 20Hz from 2xSL-DVI was just fine for practically any purpose.

So, faced with the need to run with just a single SL-DVI port, it was time to see if a bit of tweaking could be applied to reduce the blanking periods and squeeze a few more FPS out of the monitor. In the end, 17.1Hz turned out to be the limit of what could be achieved. And it turns out, this is sufficient for the mouse skipping to go away and make the monitor reasonably pleasant to use.

(Note: My wife disagrees – she claims she can see the mouse skipping at 17.1Hz. OTOH, she is unable to read my normal font size (MiscFixed 8-point) on this monitor at full resolution. So how you get along with this setup will largely depend on whether your eyes’ sensitivity is skewed toward high pixel density or high frame rates.)

The xorg.conf I used is here:

Section "Monitor"
  Identifier    "DVI-0"
  HorizSync    31.00 - 105.00
  VertRefresh    12.00 - 60.00
  Modeline "3840x2400@17.1"  165.00  3840 3848 3880 4008  2400 2402 2404 2406 +hsync +vsync
EndSection

Section "Device"
  Identifier    "ATI"
  Driver        "radeon"
EndSection

Section "Screen"
  Identifier    "Default Screen"
  Device        "ATI"
  Monitor        "DVI-0"
  DefaultDepth    24
  SubSection "Display"
    Modes    "3840x2400@17.1"
  EndSubSection
EndSection

The Modeline could easily be used to create an equivalent setting in Windows using PowerStrip or a similar tool, or you could hand-craft a custom monitor .inf file.

In the process of this, however, I have discovered a major limitation of some of the Xorg drivers. Generic frame buffer (fbdev) and VESA (vesa) drivers do not support Modelines, and will in fact ignore them. ATI’s binary driver (fglrx) also doesn’t support modelines. Linux CCC application mentions a section for custom resolutions, but there is no such section in the program. So if you want to use a monitor in any mode other than what it’s EDID reports, you cannot use any of these drivers. This is a hugely frustrating limitation. In the case of fbdev driver, it is reasonably forgiveable because it relies on whatever modes the kernel frame buffer exposes. In the case of the VESA driver it is understandable that it only supports standard VESA modes. But ATI’s official binary driver lacking this feature is quite difficult to forgive – it has clearly be dumbed down too far.

More/Better Internal Storage on the Toshiba AC100 – Part 2

Following my research for the previous article about the performance of SD/CF/USB flash modules, the only conclusion I could reach is that most of them are pretty dire. The only notable exception among the SD cards seems to be the latest generation of the SanDisk Extreme Pro (95MB/s) cards that just about managed to squeeze out enough performance on random writes to match a 7200rpm disk. Still, this is pretty dire compared to any reasonable SSD, so I wanted to see what else could be done about installing extra storage with good performance into an AC100.

What I came across is this: SuperTalent RC8 USB stick. It may look like a USB stick, but it is actually a full-on SSD, featuring a SandForce 1200 flash controller. I figured this was worth a shot, even though the4 specifications indicate it is rather large (far too large to fit inside an AC100 in it’s standard form). Stripped out of the casing, however, it looks like RC8 might just be fittable inside the AC100.

This is what I ended up with. There appears to be only one place inside an AC100 where a bare RC8 circuit board could be fitted. You will need the following:

1) P3MU mini-PCIe USB break-out module

2) SuperTalent RC8 USB stick

3) Custom made USB cable (male and female type A USB connectors, some single core wire, and some skill with a soldering iron)

Measure out exactly how long you need the cable to be – there is no room to tuck away excess able inside an AC100. Here is what my cable layout ended up looking like.

AC100 motherboard with P3MU and custom USB cable fitted

AC100 motherboard with P3MU and custom USB cable fitted

This is what it looks like with the top panel fitted. Note the large cut-out that has been made below the mini-PCIe slot access hole.

AC100 modified to receive RC8 USB SSD

AC100 modified to receive RC8 USB SSD

And again with the screws fitted. Note that one of the screw holes is in the area that had to be cut out. This shouldn’t affect the structural integrity of the AC100, though. Also note that the right speaker cable has been re-routed slightly to now go over the LED ribbon cable.

AC100 modified to receive RC8 SSD

AC100 modified to receive RC8 SSD

This is what it looks like with the RC8 attached. Now you can see why the cut-out in the top panel was exactly the shape it was – I specifically cut out the minimum possible amount to allow the RC8 to fit.

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

I also put a piece of thin transparent sticky tape over it to hold in in place, just to make sure nothing can short out against the underside of the keyboard.

Toshiba AC100 with the SuperTalent RC8 SSD

Toshiba AC100 with the SuperTalent RC8 SSD

And that is pretty much it. Put the keyboard back in and bolt it all together. The metal part of the USB connector will sit a tiny bit above the line of the panel, but the only way you’ll notice it once you put the keyboard back on is by knowing that there is a tiny bulge there.

Your AC100 should now be able to handle ~ 2000 IOPS on both random reads and random writes, along with much better life expectancy that having proper flash management brings.

At this point I would like to point out just how impressed I am with the SuperTalent RC8 USB SSD. Not only is the performance fenomenal (for a USB stick at least), but it really behaves like a SATA SSD – to the point where you can use tools like hdparm and smartctl on it (yes, it even supports SMART).