Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

Having spent a considerable amount of time, effort, and ultimately money trying to find decently performing SD, CF and USB flash modules, I feel I really need to ensure that I make the lives of other people with the same requirements easier by publishing my findings – especially since I have been unable to find a reasonable comprehensive data source with similar information.

Unfortunately, virtually all SD/microSD (referred to as uSD from now on), CF and USB flash modules have truly atrocious performance for use as normal disks (e.g. when running the OS from them on a small, low power or embedded device), regardless of what their advertised performance may be. The performance problem is specifically related to their appalling random-write performance, so this is the figure that you should be specifically paying attention to in the tables below.

As you will see, the sequential read and write performance of flash modules is generally quite good, as is random-read performance. But on their own these are largely irrelevant to overall performance you will observe when using the card to run the operating system from, if the random-write performance is below a certain level. And yes, your system will do several MB of writing to the disk just by booting up, before you even log in, so don’t think that it’s all about reads and that writes are irrelevant.

For comparison, a typical cheap laptop disk spinning at 5400rpm disk can typically achieve 90 IOPS on both random reads and random writes with typical (4KB) block size. This is an important figure to bear in mind purely to be able to see just how appalling the random write performance of most removable flash media is.

All media was primed with two passes of:

 dd if=/dev/urandom of=/dev/$device bs=1M oflag=direct

in order to simulate long term use and ensure that the performance figures reasonably accurately reflect what you might expect after the device has been in use for some time.

There are two sets of results:

1) Linear read/write test performed using:

dd if=/dev/$device of=/dev/null    iflag=direct
dd if=/dev/zero    of=/dev/$device oflag=direct

The linear read-write test script I use can be downloaded here.

2) Random read/write test performed using:

iozone -i 0 -i 2 -I -r 4K -s 512m -o -O +r +D -f /path/to/file

In all cases, the test size was 512MB. Partitions are aligned to 2MB boundaries. File system is ext4 with 4KB block size (-b 4096) and 16-block (64KB) stripe-width (-E stride=1,stripe-width=16), no journal (-O ^has_journal), and mounted without access time logging (-o noatime). The partition used for the tests starts at half of the card’s capacity, e.g. on a 16GB card, the test partition spans the space from 8GB up to the end. This is in done in order to nullify the effect of some cards having faster flash at the front of the card.

The data here is only the first modules I have tested and will be extensively updated as and when I test additional modules. Unfortunately, a single module can take over 24 hours to complete testing if their performance is poor (e.g. 1 IOPS) – and unfortunately, most of them are that bad, even those made by reputable manufacturers.

The dd linear test is probably more meaningful if you intend to use the flash card in a device that only ever performs large, sequential writes (e.g. a digital camera). For everything else, however, the dd figures are meaningless and you should instead be paying attention to the iozone results, particularly the random-write (r-w). Good random write performance also usually indicates a better flash controller, which means better wear leveling and better longevity of the card, so all other things being similar, the card with faster random-write performance is the one to get.

Due to WordPress being a little too rigid in it’s templates to allow for wide tables, you can see the SD / CF / USB benchmark data here. This table will be updated a lot so check back often.

13 thoughts on “Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

    • JFFS2 is ancient, the whole block device must be scanned at mount time (slow) and it is intended for use on raw NAND rather than on a normal block device that (in theory) does it’s own wear leveling).

      LogFS is also designed for raw NAND.

      For raw NAND devices, UBIFS is probably the best choice at the moment, but either way, it is not relevant to normal flash media such as SD, CF or USB media.

      NilFS2 could be used, and it does make a great improvement to random-write performance by virtue of all of it’s writes always being sequential, but it is not without it’s problems. It is impossible to tell how much free space you really have on the device, and it’s garbage collection method can actually cause a significantly increased flash wear (unless the underlying flash media does no wear leveling of it’s own at all, which is unlikely).

  1. I dread to think what that card will cost when (and if) it actually becomes available. Let me know if you find somebody actually selling these.

  2. Some googling reveals excellent results on Sandisk Class 4 cards. People are supposedly getting on the order of ~250 random write IOPS! Interestingly, “faster” class 10 models are actually slower in random writes.

      • I think the results there are misleading because all the tests were done with relatively small test sizes. A lot of cards, particularly SanDisk and Pretec seem to have the ability to cheat the benchmarks (including iozone) with smaller test sizes. What “smaller” means in this case varies, sometimes < 16MB, sometimes as much as 128MB. This is one of the reasons why I am running my own tests with 512MB test sizes, and if I find reason to suspect that a card is still managing to cheat it's way to inflated figures I will up the test size.

        I suspect that some cards might also be specifically optimized for the Crystal Disk Mark test pattern since that is what most people seem to use.

        Unfortunately, it also shows the SanDisk Class 10 Extreme SD card to have about 20 IOPS, which undermines the credibility of the results – I have one of those and the performance is more like 3 IOPS. It is possible that they didn't overwrite the cards a couple of times with random data to ensure that the performance figures reflect the cards' performance after longer term use.

        Bottom line – if the data I felt I could trust already existed elsewhere I wouldn't be bothering with this bit of research of my own.

  3. The test appears to be only 50MB in size, which allows the card to cheat. I’d be interested to see what his results are with 512MB test size. I suspect it’ll be in low single figures.

    And regarding the cost – yes, it is getting silly. Proper SATA SSDs are actually cheaper per GB nowdays than the (supposedly) more advanced SD cards.

  4. Can we expect some raid0 microSD tests as well (despite your prior statement about the microSDs random-write IOPS)?

    • Yes, if I can find some uSD cards that achieve a level of performance that wouldn’t be painful to use.

      There are also issues surrounding 2-disk RAID0 optimization WRT making sure that block size, chunk size and block group size all align optimally. You can read through the article on disk and file system optimization here to get the gist of what I’m talking about.

      For example, to ensure that you don’t write more data than you have to on flash, you want to make sure that chunk size = block size. But block group size is only adjustable in increments of 8 blocks, and since you only have 2 disks, that means that the only way this will align is if you have chunk size = 8 blocks. The downside is that for a 1 block write you will still have to write 8 blocks, so performance and longevity with writes smaller than 8 blocks will suffer, but on the plus side you can make sure that the superblocks are spread across both disks rather than just one. The 8-block chunks can be mitigated to some extent by using 1KB file system blocks instead of 8KB ones, but that means more metadata which means more book keeping related overheads.

      There is unfortunately no ideal solution to the problem other than adding a 3rd disk, which in this case isn’t possible.

  5. Gordan,

    Here are the results of the Sandisk Extreme Pro (8GiB, Rated 95MB/s) on my AC100. I used Ubuntu 12.04 installed on the internal drive to test this. I do not know why the sequential speed is so slow on the AC100. Becouse using the same test on a different pc it seems limited by the usb card reader (~20MB/s).

    Sequential results:
    Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
    Write: 1.7 MB/s 3.3 MB/s 5.6 MB/s 7.0 MB/s 8.1 MB/s 8.7 MB/s 8.9 MB/s 9.2 MB/s 9.6 MB/s
    Read: 3.3 MB/s 4.8 MB/s 6.0 MB/s 7.0 MB/s 7.6 MB/s 8.0 MB/s 8.1 MB/s 8.3 MB/s 8.5 MB/s

    Random results however, are very good:
    KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
    524288 4 162 606 1172 133

    • Thanks, I added the results to the tables. Seems like there is a new winner among the SD cards.

      The read speed does indeed seem slow, especially compared to the writes. No idea why that might be the case. I haven’t observed that artifact before. The only thing that comes to mind that is different is that this is a UHS-I card (all the cards I tested are lower spec).

      Remember to re-format the card with -E discard option before you install the OS onto it. 🙂

  6. I have been reading your articles, that cooling solution along with clock speed would probably amaze toshiba itself 🙂

    I have to ask you though about the screen upgrade, does it make watching movies enjoyable? I feel that the regular screen have far to low quality (not just resoloution) to do anything on it other than web browsing.

    Ps, I wish you made the ac200 if its ever made, not “who should not be named”.

    • Thank you. 🙂
      The cooling solution is nothing special, it’s just a matter of doing what you can given the minimal space that is available.
      I have to say I was quite shocked by the overclockability of the Tegra2 – my experience with other Nvidia chips showed them to come pre-overclocked past stable limits from the factory.
      Screen-wise, I don’t think there is anything wrong with the quality of the standard display panel. I don’t really use my AC100 for watching videos, but what matters to me is screen resolution. I find 1280×720 to just about be usable. 1024×600 isn’t enough for anything.

Comments are closed.