QEMU supports two x86 chipsets it can emulate. The very ancient (1996) i440FX chipset and the more recent (2007) Q35 chipset. For a long time, Q35 was advised against for GPU passthrough because some parts weren’t quite fully addressed. On the other hand, the advantage of Q35 is that it supports PCIe natively. i440FX only supports PCI and all passed through hardware had to pretend to be PCI rather than PCIe. In theory, this means that Q35 should lead to a slightly lower emulation overhead, so I decided to test it between different workstation seats on my virtualized workstation server.
Initial results are quite interesting. I left both VMs idle after booting with as close to identical setup as there can be (other than Q35 vs i440FX). After around 3 hours, I measured the total CPU time spent while idling. These are two otherwise identical VMs sitting at their login screens after booting:
So it looks like the VM based on Q35 emulated chipset used about 23% less CPU while idling. I haven’t noticed any performance difference by the seat of my pants after conversion to Q35. I haven’t actually run any other benchmarks yet, but a 23% CPU saving with Q35 when idling seems significant. Especially if you have a lot of Windows VMs running that are sitting idle most of the time.
I’ve recently been trying to put together a remote gaming setup, and while I got it to work with Windows 10, the hardware H.264 encoding was a little flakey – it would drop out all the time at higher resolutions, and only one user could be logged in at the same time. So I started looking at Windows Server 2016.
One interesting obstacle is that the latest Nvidia drivers (49x+) don’t work on Windows 2016 server. But I randomly tried the latest 3xx series drivers which support the GeForce GTX 10xx cards I use, and those worked. Eventually after a lot of bisecting updates, I found that the very latest series of drivers before 49x+ – the 47x series “gaming ready” does in fact work. The studio driver of the exact same version, unfortunately, refuses to install.
The experience so far is pretty good, and I haven’t noticed any hardware encoding dropouts. There is an easy ways to tell when hardware encoding drops out: The image quality will improve dramatically (Nvidia hardware encoding is is a bit blocky under a gaming workload), but the frame rate will tank because all of the CPU gets eaten by Windows’ built in H.264 encoding.
Overall, this works much better with Windows Server 2016 than it did with Windows 10, but the process to get it set up is the same.
There is one major limitation to this – RDP only supports absolute rather than relative mouse movements, which means it is an unsuitable solution for some types of games, such FPS. So it’d be really handy to get this working with Steam streaming. Unfortunately, for Steam streaming to work from an RDP server, the session has to be promoted to a console session. Here is a handy script to turn your RDP session into a console session:
for /f "tokens=3" %%a in ('c:\windows\system32\qwinsta my_username ^| findstr /v "ID"') do tscon %%a /password:my_password /dest:console
Replace my_username and my_password with your Windows username and password. So start Steam and run a batch file with the above as administrator. This will take your current session and move it to the console. RDP session will be closed on you, and once that happens, you will be able to stream from that Steam session via Steam streaming.
I recently came across a rather interesting issue that seems to be relatively unrecognised – since 18xx updates, the idling Windows guest VMs seem to be consuming about 30% of CPU on the Linux KVM host. This took me a little while to get to the bottom off, and after excluding the possibility of it being caused by any active processes from inside the VM, I eventually pinned it down to the way system timers are used.
What seems to be happening is that the Windows kernel keeps polling the CPU timer all the time at a rather aggressive rate, which manifests as rather high CPU usage on the host even though the guest is not doing any productive work. On large virtualization server, this is obviously going to pointlessly burn through a huge amount of CPU for no benefit.
The solution is to expose an emulated Hyper-V clock. For all other clocks, the kernel seems to incessantly poll the timers, but for Hyper-V it recognises that this is a bad idea in a virtual machine, and starts to behave in a more sensible way.
To achieve this, add this to your libvirt XML guest definition:
<spinlocks state='on' retries='8191'/>
This gets the VM’s idle CPU usage from 30%+ down to a much more reasonable 1%.