WQUXGA – IBM T221 3840×2400 204dpi Monitor – Part 6: Regressing Drivers and Xen

I recently built a new machine, primarily because I got fed up of having to stop what I’m working on and reboot from Linux into Windows whenever my friends and/or family invited me to join them in a Borderlands 2 session. Unfortunately, my old machine was just a tiny bit too old (Intel X38 based) to have full, bug-free VT-d/IOMMU support required for VGA passthrough to work, so after 5 years, I finally decided it was time to rectify this. More on this in another article, but the important point I am getting to is that VGA passthrough requires a recent version of Xen. And there this part of the story really begins.

Some of you may have figured out that RHEL derivatives are my Linux distribution of choice (RedSleeve was a big hint). Unfortunately, RedHat have dropped support for Xen Dom0 kernels in EL6, but thankfully, other people have picked up the torch and provide a set of up to date, supported Xen Dom0 kernels and packages for EL6. So far so good. But it was never going to be so simple, at a time when drivers are getting increasingly dumber, feature sparse and more bloated at the same time. That is really what this story is about.

For a start, a few details about the system setup that I am using, and have been using for years.

  • I am a KDE, rather than Gnome user. EL6 comes with KDE 4, which use X RandR rather than Xinerama extensions to establish the geometry of the screen layout. This isn’t a problem in itself, but there is no way to override whatever RandR reports, so on a T221 you end up with a regular desktop on half of the T221, and an empty desktop on the other, which looks messy and unnatural.
  • This is nothing new – I have covered the issue before in part 1 (ATI, fakexinerama) and part 3 (nvidia, twinview) of this series of articles. There are, however, further complications now:
  1. EL6 had had a Xorg package update that bumped the ABI version to from 10 to 11
  2. Nvidia drivers have changed the way TwinView works after version 295.x (TwinView option in xorg.conf is no longer recognized)
  3. Nvidia drivers 295.x do not support Xorg ABI v11.
  4. Nvidia kernel drivers 295.x do not build against kernels 3.8.x.

And therein lies the complication.

Nvidia drivers v295 when used with options TwinView and NoTwinViewXineramaInfo also seem to override RandR geometry to the show there is a single, large screen available, rather than two screens. This is exactly what we want when using the T221. Drivers after 295.x (304.x seems to be the next version), don’t recognize the TwinView configuration option, and while they provide Xinerama geometry override when using the NoTwinViewXineramaInfo option, they do not override RandR information any more. This means that you end up with a desktop that looks as you would expect it to if you used two separate monitors (e.g. status bar is only on the first screen, no wallpaper stretch, etc.), rather than a single, seamless desktop.

As you can see, there is a large compound issue in play here. We cannot use the 295.x drivers, because

  1. They don’t support Xorg ABI 11 – this can be solved by downgrading the xorg-x11-server-* and xorg-x11-drv-* packages to an older version (1.10 from EL 6.3). Easily enough done – just make sure you add xorg-x11-* to your exclude line in /etc/yum.conf after downgrading to avoid accidentally updating them in the future.
  2. They don’t build against 3.8.x kernels (which is what the Xen kernel I am using is – this is regardless of the long standing semi-allergy of Nvidia binary drivers to Xen). This is more of an issue – but with a bit of manual source editing I was able to solve it.

Here is how to get the latest 295.x driver (295.75) to build against Xen kernel 3.8.6. You may need to do this as root.

Kernel source acquisition and preparation:

wget http://uk1.mirror.crc.id.au/repo/el6/SRPMS/kernel-xen-3.8.6-1.el6xen.src.rpm
rpm -ivh kernel-xen-3.8.6-1.el6xen.src.rpm
cd ~/rpmbuild/SPEC
rpmbuild -bp kernel-xen.spec
cd ~/rpmbuild/BUILD/linux-3.8.6
cp /boot/config-3.8.6-1.el6xen.x86_64 .config
make prepare
make all

Now that you have the kernel sources ready, get the Nvidia driver 295.75, the patch, patch it and build it.

wget http://uk.download.nvidia.com/XFree86/Linux-x86_64/295.75/NVIDIA-Linux-x86_64-295.75.run
wget https://dl.dropboxusercontent.com/u/61491808/NVIDIA-Linux-x86_64-295.75.patch
bash ./NVIDIA-Linux-x86_64-295.75.run --extract-only
patch < NVIDIA-Linux-x86_64-295.75.patch
cd NVIDIA-Linux-x86_64-295.75
export IGNORE_XEN_PRESENCE=y
export SYSSRC=~/rpmbuild/BUILD/linux-3.8.6
cp /usr/include/linux/version.h $SYSSRC/include/linux/
./nvidia-installer -s

And there you have it Nvidia driver 295.75 that builds cleanly and works against 3.8.6 kernels. The same xorg.conf given in part 3 of this series will continue to work.

It is really quite disappointing that all this is necessary. What is more concerning is that the ability to use a monitor like the T221 is diminishing by the day. Without the ability to override what RandR returns, it may well be gone completely soon. It seems the only remaining option is to write a fakerandr library (similar to fakexinerama). Any volunteers?

It seems that Nvidia drivers are both losing features and becoming more bloated at the same time. 295.75 is 56MB. 304.88 is 65MB. That is 16% bloat for a driver that is regressively missing a feature, in this case an important one. Can there really be any doubt that the quality of software is deteriorating at an alarming rate?

Clevo M860TU / Sager NP8662 / mySN XMG5 GPU (GTX260M / FX 3700M) Replacement / Upgrade and Temperature Management Modifications

Recently, my wife’s Clevo M860TU laptop suffered a GPU failure. Over our last few Borderlands 2 sessions, it would randomly crash more and more frequently, until any sort of activity requiring 3D acceleration refused to work for more than a few seconds. The temperatures as measured by GPU-Z looked fine (all our computers get their heatsinks and fans cleaned regularly), so it looked very much like the GPU itself was starting to fail. A few days later, it failed completely, with the screen staying permanently blank.

The original GPU in it was an Nvidia GTX260M. These proved near impossible to come by in MXM III-HE form factor. Every once in a while a suitable GTX280M would turn up on eBay, but the prices were quite ludicrous (and consequently they would never sell, either). Interestingly, Nvidia Quadro FX 3700M MXM III-HE modules seem to be fairly abundant and reasonably priced. This is interesting considering that they cost several times more than the GTX280M new. Their spec (128 shaders, 75W TDP) is identical.

MXM-III HE Nvidia Quadro FX 3700M

The GTX260M has 112 shaders and a lower TDP of 65W, so the cooling was going to be put under increased strain (especially since I decided to upgrade it from a dual core to a quad core CPU at the same time – more on that later). Having fitted it all (it is a straight drop-in replacement, but make sure you use shims and fresh thermal pads for the RAM if required to ensure proper thermal contact with the heatsink plate), I ran some stress tests.

Within 10 minutes of OCCT GPU test, it hit 97C, and started throttling and producing errors. I don’t remember what temperatures the GTX260M was reaching before, but I am quite certain it was not this high. I had to find a way to reduce the heat production of the GPU. Given the cooling constraints in a laptop, even a well designed one like the Clevo M860TU, the only way to reduce the heat was by reducing either the clock speed or the voltage – or both. Since the heat produced by a circuit is proportional to the multiple of the clock speed and the square of the voltage, reducing the voltage has a much bigger effect than reducing the clock speeds. Of course, reducing the voltage necessitates a reduction in clock speed to maintain stability. The only way to do this on an Nvidia GPU is by modifying the BIOS. Thankfully, the tools for doing so are readily available:

After some experimentation, it wasn’t difficult to find the optimal setting given the cooling constraints. The original settings were:

  • Core: 550MHz
  • Shaders:1375MHz
  • Memory: 799MHz (1598MHz DDR)
  • Voltage: 1.03V (Extra)
  • Temperature: Throttles  at 97C and gets unstable (OCCT GPU test)
  • FPS: ~17

The settings I found that provided 100% stability and reduced the temperatures down to a reasonable level are as follows:

  • Core: 475MHz
  • Shaders: 1250MHz
  • Memory: 799MHz (1598MHz DDR)
  • Voltage: 0.95V (Extra)
  • Temperature: 82C peak (OCCT GPU test)
  • FPS: ~16

The temperature drop is very significant, but the performance reduction is relatively minimal. It is worth noting that OCCT is specifically designed to produce maximum heat load. Playing Borderlands 2 and Crysis with all the settings set to maximum at 1920×1200 resulted in peak temperatures around 10C lower than the OCCT test.

While I had the laptop open I figured this would be a good time to upgrade the CPU as well. Not that I think that the 2.67Hz P9600 Core2 was underpowered, but with the 2.26GHz Q9100 quad core Core2s being quite cheap these days, it seemed like a good idea. And considering that when overclocking the M860TU from 1066 to 1333FSB I had to reduce the multiplier on the P9600 (not that there was often any need for this), the Q9100’s lower multiplier seemed like a promising overall upgrade. The downside, of course, was that the Q9100 is rated to a TDP of 45W compared to P9600’s 25W. Given the heatsink on the Clevo M860TU is shared between the CPU and the GPU, this no doubt didn’t help the temperatures observed under OCCT stress testing. Something could be done about this, too, though.

Enter RMClock – a fantastic utility for tweaking VIDs to achieve undervolting on x86 CPUs at above minimum clock speed. Intel Enhanced SpeedStep reduces both the clock speed and the voltage when applying power management. The voltage VID and clock multipliers are overrideable (within the minimum and maximum for both hard-set in the CPU), which means that in theory, with a very good CPU, we could run the maximum multiplier and minimum VID to reduce power saving. In most cases, of course, this would result in instability. But, it turns out, my Q9100 was stable under several hours of OCCT testing at minimum VID (1.05V) at top multiplier (nominal VID 1.275V). This resulted in a 10C drop in peak OCCT CPU load tests, and a 6C drop in peak OCCT GPU load tests (down to 76C from 82C peak).

Getting the Best out of the MacBook Pro Retina 15 Screen in VMware Fusion

I make no secret of the fact that I am neither a fan of Apple nor a fan of virtualization. But sometimes they make for the best available option. I have recently found myself in such a situation. My current employer, mercifully, allows employees a choice of something other than vanilla Windows machines to work on, and there was an option of getting a MacBook Pro. As you can probably guess from some of the previous articles here, I find the single most important productivity feature of a computer to be the screen resolution, an opinion I appear to share with Linus Torvalds. So I opted for the 15″ MacBook Pro Retina.

Unfortunately, the native Linux support on that machine still isn’t quite perfect. Since speed is not a concern in this particular case, I opted to run Linux using VMware Fusion on OSX. Unfortunately, VMware Fusion cannot handle full 2880×1800 resolution of the display and with lower resolutions running in full screen mode the quality is badly degraded by blurring and aliasing. The solution is to create a custom 2880×1800 mode in /etc/X11/xorg.conf that fits within VMware virtual graphic driver’s capabilities. This took a bit of working out since the mode had to fit within horizontal and vertical refresh rates of the driver and the total pixel clock the driver allows. The following are the settings that work for me:

Section "Monitor"
        Identifier "MacBookPro"
        HorizSync 30.0 - 90.0
        VertRefresh 30.0 - 60.0
        ModeLine "2880x1800C" 358.21 2880 2912 4272 4304 1800 1839 1852 1891
EndSection

Section "Screen"
        Identifier "Default Screen"
        Monitor "MacBookPro"
        DefaultDepth 24
        SubSection "Display"
                Modes "2880x1800C"
        EndSubSection
EndSection

The result is being able to run a full screen 2880×1800 mode, and it looks absolutely superb.

RedSleeve Linux Public Alpha

Here is something that I have been working on of late.

RedSleeve Linux is a 3rd party ARM port of a Linux distribution of a Prominent North American Enterprise Linux Vendor (PNAELV). They object to being referred to by name in the context of clones and ports of their distribution, but if you are aware of CentOS and Scientific Linux, you can probably guess what RedSleeve is based on.

RedSleeve is different from CentOS and Scientific Linux in that it isn’t a mere clone of the upstream distribution it is based on – it is a port to a new platform, since the upstream distribution does not include a version for ARM.

The reason RedSleeve was created is because ARM is making inroads into mainstream computing, and although Fedora has supported ARM for a while, it is a bleeding edge distribution that puts the emphasis on keeping up with the latest developments, rather than long term support and stability. This was not an acceptable solution for the people behind this project, so we set out to instead port a distribution that puts more emphasis on long term stability and support.

Alleviating Memory Pressure on Toshiba AC100

After all the upgrades and tweaks to the AC100 (screen upgrade to 1280×720, cooling improvements and boosting the clock speed by over 40%), only one significant issue remains: it only has 512MB of RAM. Unfortunately, the memory controller initialization is done by the closed-source boot loader, so even if we were to solder in bigger chips (Tegra2 can handle up to 1GB of RAM), it is unlikely in the extreme that it would just work.

So, other than increasing the physical amount of memory, can we actually do anything to improve the situation? Well, as a matter of fact, there are a few things.

Clawing Back Some Memory

By default, the GPU gets allocated a hefty 64MB of RAM out of 512MB that we have. This is quite a substantial fraction of our memory, and it would be nice to claw some of it back if we are not using it. I find the Nvidia’s Tegra binary accelerated driver to be too buggy to use under normal circumstances, so I use the basic unaccelerated frame buffer driver instead. There are two frame buffer allocations on the AC100: the internal display and the HDMI port. The latter is only intended for use with TVs which means we shouldn’t need a resulition of more than 1920×1080 on that port. The highest resolution display we can have on the internal port is 1280×720. That means that the maximum amount of memory used by those two frame buffers is 8100KB + 3600KB 11700KB. To be on the safe side, let’s call that 16MB. That still leaves us 48MB that we should be able to safely reclaim. We can do that by telling the kernel that there is extra memory at certain addresses using the following boot parameters:

mem=448M@0M mem=48M@464M

Make sure the accelerated binary Tegra driver is disabled in your xorg.conf, reboot and you should now have 496MB of usable RAM instead of 448MB. It’s just over an extra 10%, which should make a noticeable difference given how tight the memory is to begin with.

If you aren’t using the HDMI interface, my tests show that it is in fact possible to reduce the GPU memory to just 2MB with no ill effects, when using the 1280×720 display panel, because the frame buffer seems to operate in 16-bit mode by default:

mem=448M@0M mem=62M@450M

That leaves a total of 510MB of for applications.

Memory Compression

In the recent kernels, there are two modules that are very useful when we have plenty of CPU resources but very little memory – just the case on the AC100. They are zcache and zram. On the 3.0 kernels instead of zram we can use frontcache which is similar but has the advantage that it is aware and cooperates with zcache. Since at the time of writing this 3.0 isn’t quite as polished and stable for the AC100 as 2.6.38, let’s focus on zram instead.

Assuming you have compiled zcache support into your kernel, all you need to do to enable it is add the kernel boot paramter “zcache”. From there on, your caches should be compressed, thus increasing the amount they can store.

zram provides a virtual block device backed by RAM, but the contents are compressed, so it should always end up using less than the amount of memory it presents as a block device (unless all of the data is uncompressible, which is very unlikely). To err on the side of caution we shouldn’t set this to more than half of the total memory across all the zram devices. To ensure optimal performance, we should also set the number of zram devices to be the same as the number of CPUs cores in the system to make sure that all CPUs end up being used (each zram device handler is a single thread).

To set the number of zram devices to 2 (Tegra2 has 2 CPU cores), we need to create the file /etc/modprobe.d/zram.conf containing the following line:

options zram num_devices=2

Then once we load the zram module (modprobe zram), we should see device nodes called /dev/zram*. We can configure the devices:

echo <memory_size_in_bytes> > /sys/block/<zram_device>/disksize

The amount of memory assigned to each zram device should be such that their total combined size doesn’t exceed half of the total physical memory in the system.

Then we can create swap headers on those zram devices using mkswap (e.g. mkswap /dev/zram0) and enable swapping to them (swapon -p100 /dev/zram0).

We should now have some compressed RAM for swapping to instead of swapping to a slow SD card.

Tweaks

It turns out that some of the default settings on Linux distributions aren’t as sensible as they could be. By default the amount of stack space each thread is allocated is 8MB. This is unnecessarily large and results in more memory consumption than is necessary. Instead we can set the soft limit to 256KB using “ulimit -s 256”. Ideally we should make this happen automatically at startup by creating a file /etc/security/limits.d/90-stack.conf containing the following:

* soft stack 256

Some users have reported that this can increase the amount of available memory after booting by a a rather substantial amount. Since this is a soft limit, programs that require more stack space can still allocate it by asking for it.

Choice of Software

One of the most commonly used types of software nowdays is a web browser, and unfortunately, most web browsers have become unreasonably bloated in recent years. This is a problem when the amount of memory is as limited as in it is on most ARM machines. Firefox and to a somewhat lesser extent Chrome require a substantial amount of memory. However, there is another reasonably fully featured alternative that works on ARM – Midori. Midori is based on the Webkit rendering engine, the same one that is used by Chrome and Safari. However, it’s memory footprint is approximately half of the other browsers. Unfortunately, it’s JavaScript support isn’t quite as good as on Firefox and Chrome yet, but it is sufficiently good for most things, and if memory pressure is a serious issue, you might want to try it out.