Hardware Accelerated SSL on ARM – Redux

A long time ago, I posted an article about advantages of hardware accelerated SSL encryption, and how to get it working on Fedora Linux. Since then, some things have improved, and some things have regressed.

Improvements:

Regressions:

  • RedHat have broken OpenSSH with their audit patch. This is particularly inconsistent with the fact that the distro supplied openssh package in EL6 is built with the –with-ssl-engine option, to enable support for hardware crypto acceleration, yet this is clearly completely untested, which begs the question of what the point of it is.

Thankfully, the regression mentioned above can be fixed to make sshd work properly with hardware crypto offload.

Here are links to patched OpenSSL and OpenSSH packages for EL6, current at the time of writing this article:

http://ftp.redsleeve.org/pub/el6/packages/soc/SRPMS/openssl-1.0.1e-30.el6.11.cryptodev.src.rpm

http://ftp.redsleeve.org/pub/el6/packages/soc/SRPMS/openssh-5.3p1-104.el6.1.cryptodev.src.rpm

While ssh with using the blowfish algorithm in software is very fast and good enough for general purpose ssh usage, for some operations, such as transferring ZFS snapshots over ssh, using hardware offloaded AES provides a very welcome performance boost, because it leaves more CPU available for other processes.

ZFS-FUSE 0.7.1 Released

The last official release of zfs-fuse was years ago, and it was seriously starting to fall behind other implementations. It was effectively abandoned, which is quite inconvenient considering it is still the only viable option on 32-bit Linux installations (e.g. on ARM or those who are still tied to i686 for legacy reasons).

Since I use Linux on ARM heavily, I have been working on changing this for the past few weeks. The last official release 0.7.0 was made by Seth Heeren a few years ago, and this supported ZFS pool versions up to v23. Emmanuel Anne was maintaining an unofficial post-0.7.0 branch that had support for pool versions up to v26 added. Over the past couple of years, other people have contributed a few patches here and there (manual ashift setting at boot time, some patches to add support for ARM, a couple of patches maintained out of tree shipped with the Fedora package). Over the past few weeks I need a few additional features that have existed in other implementations, particularly for running a root file system on it (mount.zfs for legacy mount points, and better systemd/initramfs), so I added those features. It also transpired that a few of the patches that made it into the official 0.7.0 release weren’t in Emmanuel’s code tree since it was forked before the official 0.7.0 release. I located and backported those from Seth’s maint branch on github.

With all this done, and with no other volunteers showing any interest in further maintaining zfs-fuse, it seems to have fallen to me to make the decision to take the 0.7.1 release. I have tested this extensively on my ARM systems with pools of various sizes (16GB to 16TB) and complexities (single disk to RAIDZ2) and it has been very stable.

If you are stuck on a 32-bit Linux platform and would love the features of ZFS, you can find the latest release of zfs-fuse on on github:

https://github.com/gordan-bobic/zfs-fuse

Future work will include adding support for additional pool versions. I have already created branches for those, but, this will need extensive testing before I deem it stable enough for a release. If you are interested in helping with either development or testing of zfs-fuse, please, do get in touch.

EVGA SR-2 – Long Term Review

Having used the EVGA’s once flagship and possibly their most hyped up ever motherboard for the past two and a half years and having fought it’s many bugs and quirks extensively over that period through many uses it was supposed to, in theory, be capable of but was clearly never tested against, it seemed like a good idea to collate all the issues and workarounds into a single article. These findings have been cross-checked against multiple SR-2 motherboards.

Hardware / BIOS / POST

While there are various minor annoying bugs in the BIOS itself, I will not go into details of those and instead focus on the issues of real practical use

96GB of RAM

Xeon X5xxx series CPU specification states that each is capable of addressing 192GB of RAM. Unfortunately, EVGA SR-2 specification only states it is capable of handling up to 48GB of RAM. This is more than a little disappointing, but there is a way to persuade it to complete the POST with 96GB with 12 8GB DIMMs. You will need 12 8GB x4 dual-ranked registered DDR3 DIMMs. Insert 6 of them into the red memory slots, and boot up. Set the following:

  • MCH strap: 1600MHz
  • Memory speed: 1333
  • Manually set all the memory timings to what they were auto-detected to be
  • Set the command rate to 2T
  • No voltage increases are required just because you have 96GB – if your DIMMs are rated at 1.35V, then there is no need to set DIMM voltages higher than 1.35V.

Insert the remaining 6 DIMMs and it should now be able to boot with 96GB. The POST may take 2-3 cycles to complete, but within 30 seconds or so you should see the BIOS splash screen. Once it has booted up, a soft reboot will complete without delay. It only takes a little while on a cold boot.

Don’t expect 96GB to POST at much over 167MHz BCLK.

Unfortunately, more than 96GB will not work.

Watch out for SpeedStep Side Effects

If you enable SpeedStep but disable TurboBoost, the CPU will still boost to +1 multiplier. This is not intuitive and can cause you problems during stability testing.

Clock Generator Stability

Above 180MHz BCLK, expect to see very noisy clock signals. If you watch the clock speeds on a monitoring application, you will notice that the clock speeds will regularly spike very high and very low. This means that the stability above 180MHz BCLK is not going to be appropriate for any serious use.

Virtualization With VT-d / IOMMU

All the PCIe slots on the SR-2 are behind Nvidia NF200 PCIe bridges. Unfortunately, these have a bug in that they do not route all DMA via upstream root PCIe hub. The consequence is that when a virtual machine with PCI passthrough tries to access memory at physical range within it’s virtual sandbox that overlaps with the physical range of a PCI IOMEM area mapped to any physical device, this will be routed to the physical device rather than remapped out of the way. When this happens, at best it will result in a host crash when a physical card crashes and takes the PCIe bus down with it. At worst, the memory access will trample the region mapped to a disk controller which can easily result in garbage being written to disk – and then the host will crash anyway.

To workaround is to make sure by whatever means are available that the virtual machine does not access the area between 1GB and 4GB, which is the area reserved for mapping PCI I/O memory. Two years ago the only solution available to me was to write a patch for Xen’s hvmloader that marked that entire memory area as reserved. In theory you could also tell your guest OS to simply not use that memory (e.g. using bcdedit in Windows 7 and later to mark the area as badmem, or using mem= parameters to the Linux kernel). Today with the latest version of QEMU for Xen and KVM, you can instead use the max-ram-below-4g=1G parameter to the -machine option, which will achieve the same thing much more cleanly and with no ill side effects (such as 3GB of RAM going missing in the guest).

Note that even with this workaround, there will still be weird seemingly DMA related crashes on the SR-2 when you have VT-d enabled and you use SAS controllers. For some reason this motherboard really does not play well with them (tried three different generations of LSI, an Adapted and a 3Ware). Some controllers will simply have no disks show up when you boot the kernel with intel_iommu=on (older LSI, Adaptec), others will seem to work but randomly crash when a VM with PCI passthrough is running (3Ware). Simple SATA controllers do not seem to suffer from this problem.

Marvell 88SE9123 SATA-3 6 GBit controller

This may nominally be a 6GBit/s SATA controller, but you should be aware that its physical upstream connection is via a x1 PCIe 2.0 lane, with a maximum throughput of 5GBit/s. That means the maximum throughput you can possibly get from both of these SATA ports (the red ones on the board) combined is about 450-500MB/s. This is something to bear in mind if you are planning to connect a pair of SSDs. You will achieve higher overall throughput by connecting the 2nd SSD to the ICH10 SATA-2 controller (the black ports on the board), even through the latter only supports up to 3GBit/s.

Overclocking with Westmere Xeons

The settings I have used with great success for the past 2.5 years, in addition to those mentioned above required for operation with 96GB of RAM are:

  • CPU Core Voltage: 1.300V. This is sufficient for up to 4GHz. You may need to go as far as 1.350V for 4.15GHz, but beyond that no voltage increase will keep things stable.
  • VTT Voltage: 1.325V. This is sufficient up to about 3.33GHz uncore speeds, which is about as far as you can realistically expect to get out of Westmere Xeons. Do not under any circumstances push this past 1.350V as it is almost guaranteed to damage the CPU regardless of how good the cooling is.
  • BCLK: <= 180MHz. My experience is that this is as far as you can go before clock frequencies start to spike all over the place. In the interest of stability, I would recommend not exceeding 177MHz, as this is where 4.8GT/s QPI setting actually equals 6.4GT/s that all the components are rated at – and there seems to be almost no headroom at all for QPI overclocking on components of this generation.

Motherboard Heatsink Fan

As far as I have been able to establish, this only seems to make any appreciable difference in cases of combined extreme BCLK overclocking, IOH over-volting, and using most if not all of the 64 PCIe lanes available through the PCIe slots. In more typical use (two PCIe x16 GPUs, 166MHz BCLK, relatively low 1.250V on the IOH), the difference between the fan being full on (approx. 5000 rpm) and completely off is around 9C (46C fully on, 55C completely off). Consequently, it may be preferable in some cases to remove the aluminium duct plate surrounding the fan, disconnect the fan, and leave the heatsink to passively cool the Intel 5520 I/O Hub, Intel ICH10 South Bridge, and Nvidia NF200 PCIe bridges. The airflow through the case caused by the case fans is likely to be more than sufficient in most if not all installations. This will also prevent the sometimes extreme yet invisible dust build-up in the fins on this heatsink under the aluminium duct plate surrounding the fan causing the temperatures to be higher than they would be if there were no active fan or duct plate present.

Linux

Hot-plug Flapping

This will show up as soon as you start the installer for any distribution you choose. You will receive a flood of messages to the console which will make the system grind to a halt. The workaround is to add pcie_ports=compat to the list of kernel boot parameters. Unfortunately, there is a device on-board that is erroneously marked as hot-pluggable and results in ASPM causing to flap between plugged and unplugged states. Disabling ASPM in the BIOS is not sufficient to fix this.

Intel HD Audio Line Mapping

This took me a while to work out, and had me thinking I had a failed audio port. The front panel connector is using an unusual port, resulting in it not producing output, and not even emitting ACPI events when something is connected and disconnected. The solution is to produce a correct map and supply it to the driver (it turns out problems like this are so common that the snd-hda-intel driver can load such a map at startup.

Simply put this in /lib/firmware/hda-jack-retask.fw:

[codec]
 0x10ec0889 0x00000000 2

 [pincfg]
 0x11 0x411111f0
 0x12 0x59a3112e
 0x14 0x01014c10
 0x15 0x01011c12
 0x16 0x01016c11
 0x17 0x01012c14
 0x18 0x01a19c40
 0x19 0x02a19c50
 0x1a 0x01813c4f
 0x1b 0x0321403f
 0x1c 0x411111f0
 0x1d 0x4015e601
 0x1e 0x01441130
 0x1f 0x01c46160

And put this in /etc/modprobe.d/hda-jack-retask.conf

options snd-hda-intel patch=hda-jack-retask.fw,hda-jack-retask.fw,hda-jack-retask.fw,hda-jack-retask.fw

That should solve the problem.

Final Words

Unfortunately, it took many man-days over the past two years to work out all this, and work out the solutions. It is not acceptable that a high-end flagship product of the sort that the SR-2 was presented to be is so buggy and require so much troubleshooting from the end customer. While the SR-2 has it’s place in history as the board that allowed for overclocking Xeons, along with the gems from a long time ago such as the A-bit BP6 which allowed dual socket operation with Celerons, in the time it took to work around all of it’s bugs it is unfortunately already deprecated, discontinued, and unsupported, and the top of the line Xeons X5690 processors are selling for little enough in the second hand market that the gains simply do not justify the effort, as appeared to be the case 2-3 years ago when starting with the several times cheaper X5650 processors.

In retrospect, when the effort is accounted for, a similar build using a pair of X5690 Xeons and a Supermicro X8DTH-6F motherboard would have almost certainly been a cheaper and less problematic experience. It might not have any overclocking functionality, but while offering the same number of PCIe x16 slots (7) and memory sockets (12), it does support 192GB of RAM (4x more than the SR-2 in the same number of sockets) without any special undocumented approaches required to make it work, and comes with an 8-port SAS controller on-board, while suffering from none of the problems above. Something that just works is usually much more economical than something that ends up requiring many days of troubleshooting effort.

Virtually Gaming, Part 2: Evolution – Consolidation and Move to KVM

In the previous article in this series, I detailed the journey to my original configuration with a single host providing multiple gaming capable virtual machines as a multi-seat workstation. But things have changed since then – many game distribution platforms such as Steam, GOG and Desura have native Linux versions, and many games have been ported to run natively on Linux. The vast majority of the ones that haven’t now work perfectly under WINE.

Consequently, the ideal solution has changed as well. In the original configuration, there were 3 seats on the system – two Windows VMs for gaming and one Linux VM for more serious use. At least one of the Windows VMs could now be removed, and it’s functionality replaced with WINE and native ports.

At the same time KVM advanced greatly in features and stability, and is now much better aligned with the requirements of this multi-seat workstation project. Perhaps most importantly, the latest QEMU even provides a feature that provides a much better workaround for the issue I had to patch Xen’s hvmloader for: max-ram-below-4g (option to the -machine parameter). Setting this to 1GB comprehensively works around the IOMMU compatibility bug of the Nvidia NF200 PCIe bridges on the EVGA SR-2, without any negative side effects.

Even better, KVM also includes patches that neuter the Nvidia driver’s ability to detect it is running in the VM (add kvm=off to the list of options passed to the -cpu parameter). That means that modifying the GPU firmware or hardware to make it appear as a Quadro or Tesla card is no longer required for using it in a virtual machine. This is a massive advantage over the original Xen solution for most people.

Summary of the most significant changes:

  • Host system updated to EL7 (CentOS)
    Required to facilitate easier running of more recent kernels and Steam (no more need to build and update an additional package set to support Steam as on EL6, including glibc). On the downside – this necessitates putting up with systemd.
  • Xen replaced by KVM
  • Windows 7 VM now uses UEFI instead of legacy BIOS
    This does away with all of legacy VGA complications such as VGA arbitration and the UEFI OVMF firmware even downloads and executes the PCI devices’ BIOS during the VM’s POST, which results in the full splash screen and even UEFI BIOS configuration menus being available during the VM boot on the external console.
  • XP x64 VM removed
    Superseded by using native Linux game ports and WINE for the rest (so far every XP compatible game I have tried works)

Some of the extra repositories I used for this are:

OVMF UEFI and SeaBIOS Firmware repository from here: https://www.kraxel.org/repos/

Mainline kernel from elrepo repository: http://elrepo.org/tiki/tiki-index.php

Bleeding edge QEMU (needed for the max-ram-below-4g option): https://repos.fedorapeople.org/repos/openstack/.virt-upstream-el7/

The full libvirt xml configuration file I use for QEMU is here:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>edi</name>
<uuid>11111111-1111-1111-1111-111111111111</uuid>
<memory unit='KiB'>16777216</memory>
<currentMemory unit='KiB'>16777216</currentMemory>
<vcpu placement='static'>4</vcpu>
<sysinfo type='smbios'>
<bios>
<entry name='vendor'>GENERIC</entry>
<entry name='version'>GENERIC</entry>
<entry name='date'>01/01/2014</entry>
<entry name='release'>0.91</entry>
</bios>
<system>
<entry name='manufacturer'>GENERIC</entry>
<entry name='product'>GENERIC</entry>
<entry name='version'>GENERIC</entry>
<entry name='serial'>1</entry>
<entry name='uuid'>11111111-1111-1111-1111-111111111111</entry>
<entry name='sku'>GENERIC</entry>
<entry name='family'>GENERIC</entry>
</system>
</sysinfo>
<os>
<type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>
<boot dev='hd'/>
<smbios mode='sysinfo'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu>
<topology sockets='1' cores='4' threads='1'/>
</cpu>
<clock offset='localtime'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='block' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='hdc' bus='ide'/>
<readonly/>
<address type='drive' controller='0' bus='1' target='0' unit='0'/>
</disk>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' io='native'/>
<source dev='/dev/zvol/normandy/edi'/>
<target dev='vda' bus='virtio'/>
<serial>1</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</disk>
<controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='sata' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</controller>
<interface type='bridge'>
<mac address='52:54:00:11:22:33'/>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</memballoon>
</devices>
<qemu:commandline>
<qemu:arg value='-drive'/>
<qemu:arg value='if=pflash,format=raw,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd'/>
<qemu:arg value='-cpu'/>
<qemu:arg value='host,kvm=off'/>
<qemu:arg value='-machine'/>
<qemu:arg value='pc-i440fx-2.2,max-ram-below-4g=1G,accel=kvm,usb=off'/>
</qemu:commandline>
</domain>

The reason for the qemu:commandline section is that libvirt and especially virt-manager do not actually understand all possible QEMU parameters. The ones that they don’t support directly are in this section to avoid errors and complaints from virsh and virt-manager in normal use.

You may also notice that there are some unusual sections and values in there, so let me touch upon them in groups.

Windows Activation and Associated Checks

When you first activate Windows with a key, it keeps track of several important details of the hardware in order to detect whether the same installation has been moved into another machine. Most licenses (e.g. OEM ones) are not transferable to another machine. So in order to ensure that our installation is portable (e.g. if we upgrade to a different hypervisor at a later date), we set the various values to something static, easily memorable and predictable, so that if we ever need to migrate the VM to another host, it will not cause deactivation issues. The important settings are here (these are not in all cases complete sections, only the fragments required for this purpose, see above for the full configuration):

<uuid>11111111-1111-1111-1111-111111111111</uuid>
<sysinfo type='smbios'>
  <bios>
    <entry name='vendor'>GENERIC</entry>
    <entry name='version'>GENERIC</entry>
    <entry name='date'>01/01/2014</entry>
    <entry name='release'>0.91</entry>
  </bios>
  <system>
    <entry name='manufacturer'>GENERIC</entry>
    <entry name='product'>GENERIC</entry>
    <entry name='version'>GENERIC</entry>
    <entry name='serial'>1</entry>
    <entry name='uuid'>11111111-1111-1111-1111-111111111111</entry>
    <entry name='sku'>GENERIC</entry>
    <entry name='family'>GENERIC</entry>
  </system>
</sysinfo>
<os>
  <smbios mode='sysinfo'/>
</os>
<devices>
  <disk type='block' device='disk'>
    <serial>1</serial>
  </disk>
<devices>

Nvidia Bugs/Features Workarounds

The following sections are required in order to work around the NF200 PCIe bridge bugs (max-ram-below-4g=1G) and the Nvidia driver feature that disables GeForce GPUs in virtual machines (kvm=off):

<qemu:commandline>
  <qemu:arg value='-cpu'/>
  <qemu:arg value='host,kvm=off'/>
  <qemu:arg value='-machine'/>
  <qemu:arg value='pc-i440fx-2.2,max-ram-below-4g=1G,accel=kvm,usb=off'/>
</qemu:commandline>

CPU Configuration

<cpu>
  <topology sockets='1' cores='4' threads='1'/>
</cpu>

The reason this is important is because most non-server editions of Windows only allow up to two CPU sockets. By default QEMU presents each CPU core as being on a separate socket. That means that no matter how many CPUs you pass to your Windows VM, while they will all show up in Device Manager, only a maximum of two will be used (you can verify this using Task Manager). What the above configuration block does is instruct libvirt to tell QEMU to present four cores in a single CPU socket, so that all are usable in the Windows VM.

VFIO and Kernel Drivers

In my system I have two identical Nvidia GPUs. Numerically, the second one is primary (host), and the first one is the one I am passing to a virtual machine. I am also passing the NEC USB 3.0 controller to the VM. This is the script I wrote (in /etc/sysconfig/modules/) to bind the devices intended for the VM to the VFIO driver:

!/bin/bash
 nvidia1='lspci | grep "GTX 780 Ti" | head -1 | awk '{print $1;}`
 hda1=`echo $nvidia1 | sed -e 's/.0$/.1/'`
 nvidia2=`lspci | grep "GTX 780 Ti" | tail -1 | awk '{print $1;}'
 hda2=`echo $nvidia2 | sed -e 's/.0$/.1/'
 nec=`lspci | grep "NEC" | awk '{print $1;}'
 echo nvidia        > /sys/bus/pci/devices/0000:$nvidia2/driver_override
 echo snd-hda-intel > /sys/bus/pci/devices/0000:$hda2/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$nvidia1/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$hda1/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$nec/driver_override
 modprobe vfio-pci
 echo 10de 1284     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 10de 0e0f     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 1033 0194     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 0000:$nvidia1 > /sys/bus/pci/devices/0000:$nvidia1/driver/unbind
 echo 0000:$hda1    > /sys/bus/pci/devices/0000:$hda1/driver/unbind
 echo 0000:$nec     > /sys/bus/pci/devices/0000:$nec/driver/unbind
 echo 0000:$nvidia1 > /sys/bus/pci/drivers/vfio-pci/bind
 echo 0000:$hda1    > /sys/bus/pci/drivers/vfio-pci/bind
 echo 0000:$nec     > /sys/bus/pci/drivers/vfio-pci/bind
 modprobe nvidia

Note that the PCI bus IDs will change if you add more hardware to the machine – that is why I wrote this script, rather than assigned the devices statically by ID. The above script works for me on my hardware – you will almost certainly need to modify it for your configuration, but it should at least give you a reasonable idea of the approach that works.

Important: The devices this identifies have to match what is in your libvirt XML config file in the relevant hostdev sections. You will have to adjust that manually for your configuration, either using virsh edit or virt-manager.

Also depending on your hardware, you may need to do the initial Windows installation on the emulated GPU rather than the real one (e.g. if you are using a USB controller for the VM that requires additional drivers, as is the case with the USB 3.0 controller I am using for my VM). Otherwise you will get display output but be unable to use your keyboard/mouse during the installation.

Gaming on Linux: Steam

Pre-packaged Steam binary used to be available form the rpmfusion repository, but this no longer appears to be there. Thankfully, there is also a maintained negativo17’s repository for Steam for Fedora 20+, which installs and runs fine on EL7. You may also need to grab a few RPMs from Fedora 19 because EL7 doesn’t ship with a full complement of 32-bit libraries. The ones I found I needed are these:

libbsd-0.6.0-3.fc19.i686
libtxc_dxtn-1.0.0-3.fc19.i686
libxkbcommon-0.3.0-1.fc19.i686
openal-soft-1.16.0-2.fc19.i686
SDL2-2.0.3-1.fc19.i686
SDL2_image-2.0.0-4.fc19.i686

The reason these are from Fedora 19 is because F19 is virtually identical in terms of package versions to EL7.

Typically, the Steam RPM installation is a one-off, mostly to bootstrap the initial run, and install the dependencies. After that, a local version of Steam will be installed in the user’s home directory in ~/.local/share/Steam/. In light of the recent Steam bug resulting in deletion of the user’s entire home directory, I implemented a solution that runs Steam as a separate steam user, from that user’s own home directory. That way should anything similar to this ever happen, the only thing that would be deleted is the steam user’s home directory rather than any important files not related to running Steam games.

To do this, you will need to add a steam user, and give it necessary permissions:

$ sudo adduser steam
$ sudo usermod -a -G audio,games,pulse-access,video steam

Add the following to /etc/sudoers.d/steam:

%games ALL = (steam) NOPASSWD: /bin/steam

Create the following script (e.g. /usr/local/bin/steam.sh):

!/bin/bash
 xhost +SI:localuser:steam
 chgrp audio /run/user/$UID /run/user/$UID/pulse
 chmod 750 /run/user/$UID /run/user/$UID/pulse
 sudo -u steam /usr/bin/steam
 sudo -u steam pkill dbus-launch

From there on, when you invoke steam.sh, it will launch steam as the steam user, and pass the graphical output to the Xorg session of the logged in user. The net result is that any potentially damaging bug in Steam or associated games can only do damage to the files owned by the steam user. This security model is not dissimilar to the Android security model where every application runs under it’s own user, for similar security reasons.

Gaming on Linux: WINE

There are two obvious options for this:

1) PlayOnLinux

2) More traditional WINE (I use the one from DarkPlayer’s repository)

I only had to make one configuration change to WINE, and that is to disable the dwrite.dll library in WINE (to disable it, run winecfg, go to Libraries -> add dwrite.dll, edit dwrite.dll entry and set it to disabled). I am using XP version emulation, which isn’t even supposed to include dwrite.dll, and the problem it causes is that fonts are invisible in Steam and some other applications.

End Result

The end result is a much cleaner virtual machine configuration: e.g. no missing RAM like before with Xen, due to the NF200 bug workaround, and no need for hardware modification of my GeForce cards. The performance seems very smooth, and so far the entire setup has been completely trouble free.

There is also one fewer virtual machine and one fewer GPU in the system without any loss of functionality. Should I require an additional seat in the future, it will most likely be a Linux one, and implemented using a Xorg multi-seat configuration.

Microsoft Security Essentials on 64-bit XP

Yet another Windows related article – this detour from more typical content is expected to be short lived.

Microsoft Security Essentials was never officially supported on 64-bit Windows XP, but version 2 nevertheless installed on it and worked fine. Version 4 (version 3 never existed) refuses to install directly, saying that the version of Windows is unsupported. However, if you install version 2, the version 4 installer will happily run and install version 4 as an upgrade. It will pop up a message every time you log in warning that XP64 is EOL, but otherwise it will work just fine. So the trick is to install version 2 and then upgrade to version 4.

You may be wondering why this is relevant. My findings are that most realtime anti-malware programs thoroughly cripple performance. I used to run ClamWin+ClamSentinel as one of the least bad options, but even this was quite crippling. MSSE, on the other hand, is much more lightweight, and has thus far proved itself to be as effective in tests as most of the alternatives. The overall performance of the system is now much more acceptable.

Chrome Installer Error 0xc0000005 on Windows XP

I don’t tend to write much about Windows because it’s usefulness to me is limited to functioning as a Steam boot loader, and even that usefulness is somewhat diminished with Steam and an increasing number of games being available for Linux. Unfortunately, I recently had to do some testing that needed to be carried out using a Windows application, and I noticed that Chrome reported the above error when attempting to update itself.

The Chrome installer crash with the opaque 0xc0000005 error code on XP64 (Chrome is still supported on XP, even though MS is treating XP as EOL). Googling the problem suggested disabling the sandbox might help, but this isn’t really applicable since the problem occurs with the installer, not once Chrome is running (it runs just fine, it’s updating it that triggers the error).

A quick look at the crash dump revealed that one of the libraries dynamically linked at crash time was the MS Application Verifier, used for debugging programs and sending them fake information on what version of Windows they are running on. Uninstalling the MS Application Verifier cured the problem.

Steam on EL6 (RHEL6 / Scientific Linux 6 / CentOS 6)

The fact that Steam have decided to only officially support .deb based distributions, and only relatively recent ones at that has been a pet peeve of mine for quite some time. While there are ways around the .deb only official package availability (e.g. alien), the library requirements are somewhat more difficult to reconcile. I have finally managed to get Steam working on EL6 and I figure I’m probably not the only one interested in this, so I thought I’d document it.

Different packages required to do this have been sourced from different locations (e.g. glibc from fuduntu project, steam src.rpm from steam.48.io (not really a source rpm, it just packages the steam binary in a rpm), most of the rest from more recent Fedoras, etc.). I have rebuilt them all and made them available in one place.

You won’t need all of them, but you will need at least the following:

glibc-2.15-60.el6.i686.rpm
glibc-2.15-60.el6.x86_64.rpm
glibc-common-2.15-60.el6.x86_64.rpm
glibc-devel-2.15-60.el6.x86_64.rpm
glibc-headers-2.15-60.el6.x86_64.rpm
libtxc_dxtn-1.0.0-2.1.i686.rpm
SDL2-2.0.3-2.el6.i686.rpm
steam-1.0.0.39-2.i686.rpm
xz-5.0.5-1.el6.x86_64.rpm
xz-compat-libs-5.0.5-1.el6.x86_64.rpm
xz-libs-5.0.5-1.el6.x86_64.rpm
xz-lzma-compat-5.0.5-1.el6.x86_64.rpm

First install some the dependencies from the standard distribution packages:

yum install gtk2-engines.i686 \
            openal-soft.i686 \
            alsa-plugins-pulseaudio.i686 \
            gtk+.i686

The install the updated packages:

rpm -Uvh glibc-2.15-60.el6.i686.rpm \
         glibc-2.15-60.el6.x86_64.rpm \
         glibc-common-2.15-60.el6.x86_64.rpm \
         glibc-devel-2.15-60.el6.x86_64.rpm \
         glibc-headers-2.15-60.el6.x86_64.rpm \
         libtxc_dxtn-1.0.0-2.1.i686.rpm \
         SDL2-2.0.3-2.el6.i686.rpm \
         steam-1.0.0.39-2.i686.rpm \
         xz-5.0.5-1.el6.x86_64.rpm \
         xz-compat-libs-5.0.5-1.el6.x86_64.rpm \
         xz-libs-5.0.5-1.el6.x86_64.rpm \
         xz-lzma-compat-5.0.5-1.el6.x86_64.rpm

If you have pyliblzma from EPEL installed (required by, e.g. mock), updated xz-lzma-compat package will trigger a python bug that causes a segfault. This will incapacitate some python programs (yum being an important one). If you encounter this issue and you must have pyliblzma for other dependencies, reinstall the original xz package versions after you run steam for the first time. Updated xz only seems to be required when the steam executable downloads updates for itself.

Finally, run steam, log in, and let it update itself.

One of the popular games that is available on Linux is Left 4 Dead 2. I found that on ATI and Nvidia cards it doesn’t work properly in full screen mode (blank screen, impossible to Alt-Tab out), but it does work on Intel GPUs. It works on all GPU types in windowed mode. Unfortunately, it runs in full screen mode by default, so if you run it without adjusting its startup parameters you may have to ssh into the machine and forcefully kill the hl2_linux process. To work around the problem, right click on the game in your library, and go to properties:

Click on the “SET LAUNCH OPTIONS…” button:

You will probably want to specify the default resolution as well as the windowed mode to ensure the game comes up in a sensible mode when you launch it.
Add “-windowed -w 1280 -h 720” to the options, which will tell L4D2 to start in windowed mode with 1280×720 resolution. The resolution you select should be lower than your monitor’s resolution.

If you did all that, you should be able to hit the play button and be greeted with something resembling this:

ATI cards using the open source Radeon driver (at least with the version 7.1.0 that ships with EL6) seem to exhibit some rendering corruption, specifically some textures are intermittently invisible. This leads to invisible party members, enemies, and doors, and while it is entertaining for the first few seconds it renders the game completely unplayable. I have not tested the ATI binary driver (ATI themselves recommend the open source driver on Linux for older cards and I am using a HD6450).

Nvidia cards work fine with the closed source binary driver in windowed mode, and performance with a GT630 constantly saturates 1080p resolutions with everything turned up to maximum. I have not tested with the nouveau open source driver.

With Intel GPUs using the open source driver, everything works correctly in both windowed and full screen mode, but the performance is nowhere nearly as good as with the Nvidia card. With all the settings set to maximum, the performance with the Intel HD 4000 graphics (Chromebook Pixel) is roughly the same at 1920×1200 resolution as with the Radeon HD6450, producing approximately 30fps. The only problem with playing it on the Chromebook Pixel is that the whole laptop gets too hot to touch, even with the fan going at full speed. Not only does the aluminium casing get too hot to touch, the plastic keys on the keyboard themselves get painfully hot. But that story is for another article.

QNAP TS-421 – Review, Modification and RedSleeve Linux

Requirement

With the RedSleeve Linux release rapidly approaching, I needed a new server. The current one is a DreamPlug with an SSD and although it has so far worked valiantly with perfect reliability, it doesn’t have enough space to contain all of the newly build RPM packages (over 10,000 of them, including multiple versions the upstream distribution contains), and is a little lower on CPU (1.2GHz single core) and RAM (512MB) than ideal to handle the load spike that will inevitably happen once the new release becomes available. I also wanted a self contained system that doesn’t require special handling with many cables hanging off of it (like SATA or USB external disks). I briefly considered the Tonido2 Plug, but between the slower CPU (800MHz) and the US plug, it seemed like a step backward just for the added tidyness of having an internal disk.

Specification

The requirements I had in mind needed to cover at least the following:
1) ARM CPU
2) SATA
3) At least a 1.2GHz CPU
4) At least 512MB of RAM
5) Everything should be self contained (no externally attached components)

Selection

Very quickly the choice started to focus on various NAS appliances, but most of them had relatively non-existant community support for running custom Linux based firmware. The one exception to this is QNAP NAS devices which have rather good support from the Debian community; and where there is a procedure to get one Linux distribution to run, getting another to run is usually very straightforward. After a quick look through the specifications, I settled on the QNAP TS-421, which seems to be the highest spec ARM based model:

CPU: 2GHz ARMv5 Marvell Kirkwood (same as in the DreamPlug but 66% higher clock speed)
RAM: 1GB (twice as much as DreamPlug)
SATA: 4x 3.5″ SATA disk trays, based on the excellent Marvell 88SX7042 PCIe SATA controller
eSATA: 2x
Ethernet: 2x Gigabit (same as DreamPlug)
USB: 2x 2.0, 2x 3.0

Disks

At the time when I ordered the QNAP TS-421, it was listed as supporting 4TB drives – the largest air filled that were available at the time. I ordered 4x 4TB HGST drives because they are known to be more reliable than other brands. In the 10 days since then Toshiba announced 5TB drives, but these are not yet commercially available. I briefly considered the 6TB Helium filled Hitachi drives, but these are based on a new technology that has not been around for long enough for long term reliability trends to emerge – and besides, they were prohibitively expensive (£87/TB vs £29/TB for the 4TB model), and to top it all off, they are not available to buy.

Overview

Once the machine arrived, it was immediately obvious that the build quality is superb. One thing, however, bothered me immediately – it uses an external power brick, which seems like a hugely inconvenient oversight on an otherwise extremely well designed machine.

In order to make playing with alternative Linux installations I needed to get serial console access. To do this you will need a 3.3V TTL serial cable, same as what is used on the Raspberry Pi. These are cheaply available from many sources. One thing I discovered the hard way after some trial and error is that you need to invert the RX and TX lines between the cable and the QNAP motherboard, i.e. RX on the cable needs to connect to TX on the motherboard, and vice versa. There is also no need to connect the VCC line (red) – leave it disconnected. My final goal was to get RedSleeve Linux running on this machine, the process for which is documented on the RedSleeve wiki so I will not go into it here.

Modifying

One thing that becomes very obvious upon opening the QNAP TS-421 is that there is ample space inside it for a PSU, which made the design decision to use an external power brick all the more ill considered. So much so that I felt I had to do something about it. It turns out the standard power brick it ships with fits just fine inside the case. Here is what it looks like fitted.

It is very securely attached using double sided foam tape. Make sure you make some kind of a gasket to fit between the PSU and the back of the case – this is in order to prevent upsetting the crefully designed airflow through the case. I used some 3mm thick expanded polyurethane which works very well for this purpose. The cable tie is there just for extra security and to tidy up the coiled up DC cable that goes back out of the case and into the motherboard’s power input port. This necessitated punching two 1 inch holes in the back of the case – one for the input power cable and one for the 12V DC output cable. I used a Q.Max 1 inch sheet metal hole punch to do this. There is an iris type grommet for the DC cable to prevent any potential damage arising from it rubbing on the metal casing.

The finished modification looks reasonably tidy and is a vast improvement on a trailing power brick.

One other thing worth mentioning is that internalizing the PSU makes no measurable difference to internal temperatures with the case closed. In fact, if anything the PSU itself runs cooler than it does on the outside due to the cooling fan inside the case. The airflow inside the case is incredibly well designed, hence the reason why it is vital you use a gasket to seal the gap between the power input port on the PSU and the back of the case. To give you the idea of just how well the airflow is designed, with the case off, the HGST drives run at about 50-55C idle and 60-65C under load. With the case on they run at about 30C idle and 35C under full load (e.g. ZFS scrub or SMART self tests).

Virtualized Gaming: Nvidia Cards, Part 3: How to Modify 2xx – 4xx series GeForce into a Quadro

here has been a large amount of interest in the previous two articles in this series and many calls for a modifying guide. In this article I will explain the details of how to modify your Fermi based GeForce card into a corresponding equivalent Quadro card. Specifically, you the following:

GEFORCE MODELGPUQUADRO MODEL
GeForce GTS450GF106Quadro 2000
GeForce GTX470GF100Quadro 5000
GeForce GTX480GF100Quadro 6000

The Tesla (2xx/3xx) and Fermi (4xx) series of GPUs can be modified by modifying the BIOS. Earlier cards can also be modified, but the modification is slightly different to what is described in this article. There is no hardware modification required on any of these cards. The modification is performed by modifying what is known as the “straps” that configure the GPU at initialization time. The nouveau project (free open source nvidia driver implementation for Xorg) has reverse engineered and documented some of the straps, including the device ID locations. We can use this to change the device ID the card reports. This causes the driver to enable a different set of features that it wouldn’t normally expose on a gaming grade card, even though the hardware is perfectly capable of it (you are only supposed to have those features if you paid 4-8x more for what is essentially the same (and sometimes even inferior) card by buying a Quadro).

The main benefit of doing this modification is enabling the card to work in a virtual machine (e.g. Xen). If the driver recognizes a GeForce card, it will refuse to initialize the card from a guest domain. Change the card’s device ID into a corresponding Quadro, and it will work just fine. On the GF100 models, it will even enable the bidirectional asynchronous DMA engine which it wouldn’t normally expose on a GeForce card even though it is there (on GF100 based GeForce cards only a unidirectional DMA engine is exposed). This can potentially significantly improve the bandwidth between the main memory and GPU memory (although you probably won’t notice any difference in gaming – it has been proven time and again that the bandwidth between the host machine and the GPU is not a bottleneck for gaming workloads).

Another thing that this modification will enable is TCC mode. This is particularly of interest to users of Windows Vista and later because it avoids some of the graphics driver overheads by putting the card in a mode only used for number-crunching. Note: Although most Quadros have TCC mode available, you may want to look into modifying the card into a corresponding Tesla model if you are planning to use it purely for number crunching. You can use the same method described below, just find a Tesla based on the same GPU with equal or lower number of enabled shader processors, find it’s device ID in the list linked at the bottom of the article, and change the device IDs using the strap.

Before you begin even contemplating this make sure you know what you are doing, and that the instructions here come with no warranty. If you are not confident you know what you are doing, buy a pre-modified card from someone instead or get somebody who does know what they are doing to do it for you.

To do this, you will require the following:

  • NVFlash for Windows and/or NVFlash for DOS
    Note: You may need to use the DOS version – for some reason the Windows version didn’t work on some of my Fermi cards. If you use the DOS version, make sure you have a USB stick or other media set up to boot into DOS.
  • Hex editor. There are many available. I prefer to use various Linux utilities, but if you want to use Windows, HxD is a pretty good hex editor for that OS. It is free, but please consider making a small donation to the author if you use it regularly.
  • Spare Graphics card, in case you get it wrong. If you are new to this, your boot graphics card (the spare one, not the one you are planning to modify) should preferably not be an Nvidia one (to avoid potential embarrassment of flashing the wrong card). Skip this part at your peril.

On Fermi BIOS-es the strap area is 16 bytes long and it starts at file offset 0x58. Here is an example based on my PNY GTX480 card:
0000050: e972 2a00 de10 5f07 ff3f fc7f 0040 0000 .r*..._..?...@..
0000060: ffff f17f 0000 0280 7338 a5c7 e92d 44e9 ........s8...-D.

The very important thing to note here is that the byte order is little-endian. That means that in order to decode this easily, you should re-write the highlighted data as:
7FFC 3FFF 0000 4000 7FF1 FFFF 8002 0000

This represents two sets of straps, each containing an AND mask and an OR mask. The hardware level straps are AND-ed with the AND mask, and then OR-ed with the OR mask.

The bits that control the device ID are 10-13 (ID bits 0-3) and 28 (bit 4). We can ignore the last 8 bytes of the strap since all the bits controlling the device ID is in the first 8 bytes.

This makes the layout of the strap bits we need to change a little more obvious:

Fxx4xxxx xxxxxxxx xx3210xx xxxxxxxx
   ^                ^^^^
   |                ||||-pci dev id[0]
   |                |||--pci dev id[1]
   |                ||---pci dev id[2]
   |                |----pci dev id[3]
   |---------------------pci dev id[4]
F - cannot be set, always fixed to 0

The device ID of the GTX480 is 0x06C0. In binary, that is:
0000 0110 1100 0000
We want to modify it into a Quadro 6000, which has the device ID 0x06D8. In binary that is:
0000 0110 1101 1000

The device ID differs only in the low 5 bits, which is good because we only have the low 5 bits available in the soft strap.

So we need to modify as follows
From:   0000 0110 1100 0000
To:     0000 0110 1101 1000
Change: xxxx xxxx xxx1 1xxx

We only need to change two of the strap bits from 0 to 1. We can do this by only adjusting the OR part of the strap.

It is easier to see what is going on if we represent this as follows:

ID Bit:   4                  32 10
Strap: -xxA xxxx xxxx xxxx xxAx xxxx xxxx xxxx
Old Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  00        00        40        00
       0000 0000 0000 0000 0100 0000 0000 0000
New Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  10        00        60        00
       0001 0000 0000 0000 0110 0000 0000 0000

Note that in the edit mask above, bit 31 is marked as “-“. Bit 31 is always 0 in both AND and OR strap masks.
Bits we must keep the same are marked with “x”. Bits we need to amend are marked with “A”.

So what we need to do is flash the edited strap to the card. We could do this directly in the BIOS, but this would require calculating the strap checksum, which is tedious. Instead we can use nvflash to take care of the strap rewrite for us, and it will take care of the checksum transparently.
The new strap is:
0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x80020000
The second pair is unchanged from where we read from the BIOS above. Make sure you have ONLY changed the device ID bits and that your binary to hex conversion is correct – otherwise you stand a very good chance of bricking the card.

We flash this onto the card using:
nvflash --index=X --straps 0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x00020000
Note:
1) The last OR strap is 0x00020000 even though the data in the BIOS reads as if it should be 0x80020000. You cannot set the high bit (the left-most one) to 1 in the OR strap (just like you cannot set it to 0 in the AND strap). Upon flashing nfvlash will turn the high bit to 1 for you and what will end up in the BIOS will be 0x80020000 even though you set it to 0x00020000. This is rather unintuitive and poorly documented.
2) You will need to check what the index of the card you plan to flash is using nvflash -a, and replace X with the appropriate value.

Here is an example (from my GTX480, directly corresponding the the pre-modification fragment above) of how the ROM differs after changing the strap:

0000050: e972 2a00 de10 5f07 ff3f fc7f 0060 0010 .r*..._..?...`..
0000060: ffff f17f 0000 0280 7338 a597 e92d 44e9 ........s8...-D.

The difference at byte 0x6C is the strap checksum that nvflash calculated for us.

Reboot and your card should now get detected as a Quadro 6000, and you should be able to pass it through to your virtual machine without problems. I have used this extensively to enable me to pass my GeForce 4xx series cards to my Xen VMs for gaming. I will cover the details of virtualization use with Xen in a separate article. Note that I have had reports of cards modified using this method also working virtualized using VMware vDGA, so if this is your preferred hypervisor, you are in luck. Quadro 5000 and 6000 are also listed as supported for VMware vSGA virtualization, so that should work, too – if you have tried vSGA with a modified GeForce card, please post a comment with the details.

The same modification method described here should work for modifying any Fermi card into the equivalent Quadro card. Simply follow the same process. You may find this list of Nvidia GPU device IDs useful to establish what device ID you want to modify the card to. The GPU should match between the GeForce card the the Quadro/Tesla/Grid you are modifying to – so check which Nvidia card uses which GPU.

Many thanks to the nouveau project for reverse engineering and documenting the initialization straps, and all the people who have contributed to the effort.

In the next article I will cover modifying Kepler GPU based cards. They are quite different and require a different approach. There are also a number of pitfalls that can leave you chasing your tail for days trying to figure out why everything checks out but the modification doesn’t work (i.e. the card doesn’t function in a VM).

IBM T221 3840×2400 204dpi Monitor – Part 7: Positive Update

For once it would appear that I have a positive update on the subject of Nvidia drivers. It would seem that patching the latest (319.23) driver is no longer required on Linux. Even better, there is a way to achieve a working T221 setup without RandR getting in the way by insisting the two halves are separate monitors. I covered the issues with Nvidia drivers in a previous article.

The build part now works as expected out of the box. Simply:

export IGNORE_XEN_PRESENCE=1
bash ./NVIDIA-Linux-x86_64-319.23.run

and everything should “just work”.

Best of all, there appears to be a workaround for the RandR information being visible even when Xinerama is being overridden. It turns out, Ximerama and RandR seem to be mutually exclusive. So even though the option disabling RandR explicitly seems to get silently ignored, enabling Xinerama fixes that problem. And since the Nvidia driver’s Xinerama info override still works, this solves the problem!

You may recall from a previous article the following in xorg.conf:

[...]
Section "ServerLayout"
	Identifier "Layout0"
	Screen 0 "Screen0" 0 0
	Option "Xinerama" "0"
EndSection
[...]
Section "Screen"
	Identifier "Screen0"
[...]
	Option "NoTwinViewXineramaInfo" "True"
	Option "TwinView" "1"
	Option "TwinViewOrientation" "RightOf"
	Option "metamodes" "DFP-0:1920x2400, DFP-3:1920x2400"
[...]
EndSection

It turns out the solution is to simply enable Xinerama:

Section "ServerLayout"
	Identifier "Layout0"
	Screen 0 "Screen0" 0 0
	Option "Xinerama" "1"
EndSection
:)

This implicitly disables RandR and Nvidia driver’s Xinerama info override takes care of the rest. Magic. 

Update:
If you are still having problems when using KDE, there is another trick you can use to force xinerama and disable RandR. Ammend the following line in kdmrc:

/etc/kde/kdm/kdmrc:
ServerArgsLocal=-extension RANDR +xinerama -nr -nolisten tcp