How to set up ZFS zpools using disk partitions and make them visible to FreeNAS

Update 22/01/2015: see also this stackoverflow post for an alternative scheme.

I’d outgrown my off-the-shelf NAS, ending up with files overflowing onto various travel disks and flash drives. My existing NAS wasn’t really something I could upgrade, so I resolved not to buy another proprietary device but rather to roll my own out of ordinary components, which might give me a bit more opportunity for expansion down the line.

Looking around for NAS software I settled on FreeNAS, which apart from seeming to be generally well-regarded, also gave me an excuse to dip my toe into FreeBSD land and to use ZFS, about which I’d heard good things.

Using your old disks

Like most of us whose storage needs have grown over the years I’ve accumulated a small stack of disks of various sizes. The general advice for RAID is to buy a bunch of disks of the same size, but I hate binning hardware if it’s still completely fine.

In total I had the following drives:

  • 2 TB x 2
  • 1 TB x 1

Since 3 and 4 TB disks are available and seem to be roughly the same $/GB as any other disk size, my plan was to buy one of each and arrange them and my old disks in a ZFS RAIDZ array as follows:

<--------------- RAID Z --------------->
+----------+  +----------+  +----------+
|   1 TB   |  |          |  |          |
+----------+  |   2 TB   |  |          |
|          |  |          |  |          |
|          |  +----------+  |   4 TB   |
|   3 TB   |  |          |  |          |
|          |  |   2 TB   |  |          |
|          |  |          |  |          |
+----------+  +----------+  +----------+

Unfortunately, while ZFS does do all manner of clever things, there’s one thing it doesn’t do: disk spanning or striping within a RAID set. In other words, you can’t stack your smaller drives and then treat them as a 4TB device within a RAID set as envisioned above.

ZFS does support striping across multiple disks (of different sizes if you like), but this gives you no redundancy. If you want to mirror or use RAIDZ, the disks need to be the same size.

BTW, another thing that ZFS apparently can’t do is expand an existing RAID array by adding a new disk. (You can replace a disk with a larger one, but not add an additional disk. Don’t ask me why; I’d have thought it’s just a case of adding the disk and then redistributing the data, even if that takes a month of Sundays. But they’re clever people, so it must be harder than it seems.) The upshot is, there’s no point aiming at 4 TB chunks with the idea that later on you can just buy another 4 TB disk and slot it into the array.

While you can’t mix disk sizes very easily, ZFS can work with partitions as well as whole disks. So I resolved to add 2 x 3TB drives and partition them to achieve two zpools in the following arrangement:

<---------------------- RAID Z --------------------->
+----------+  +----------+  +----------+ +----------+
|          |  |          |  |          | |          |
|   2 TB   |  |   2 TB   |  |   3 TB   | |   3 TB   |
|          |  |          |  |          | |          |
+----------+  +----------+  |..........| |..........| +----------+
                            |          | |          | |   1 TB   |
                            +----------+ +----------+ +----------+
                            <-------------- RAID Z -------------->

Unfortunately, FreeNAS doesn’t provide any disk partitioning capability in its GUI, and doesn’t see partitions when setting up ZFS volumes (it only lets you select whole disks). So, you’ll need to head to the command line.

Working out how big your disks are

When creating a ZFS RAIDZ virtual device (vdev), you’ll want to make sure all the underlying physical devices (whole disks or partitions) are exactly the same size. In my case, this meant that I needed to know how big my 1 TB and 2 TB drives were so I could partition my 3 TB drives correctly.

Firstly, to identify the BSD device names of your drives use the camcontrol command:

[root@freenas] ~# camcontrol devlist
<OCZ-AGILITY3 2.22>                at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD30EFRX-68EUZN0 80.00A80>    at scbus1 target 0 lun 0 (ada1,pass1)
<SAMSUNG HD204UI 1AQ10001>         at scbus2 target 0 lun 0 (ada2,pass2)
<SAMSUNG HD204UI 1AQ10001>         at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD30EFRX-68EUZN0 80.00A80>    at scbus4 target 0 lun 0 (ada4,pass4)
<ST31000520AS CC32>                at scbus5 target 0 lun 0 (ada5,pass5)
<Kingston DataTraveler SE9 PMAP>   at scbus7 target 0 lun 0 (pass6,da0)

The manufacturer names should be enough for you to identify what’s what.

To find out exactly how big a disk is, use the diskinfo command:

[root@freenas] ~# diskinfo -v ada2
    512             # sectorsize
    2000398934016   # mediasize in bytes (1.8T)
    3907029168      # mediasize in sectors
    4096            # stripesize
    0               # stripeoffset
    3876021         # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    S2H7J1CZB02790  # Disk ident.

The crucial number is the media size, bolded above. This tells you how big the drive really is.

Partitioning your drives

The gpart command is what you need to partition your disks. Although what I’m describing here is partitioning an empty drive, you can use it to re-partition non-destructively — but be very careful or make sure your data is backed up.

First, you need to set up the partition table on your drive (ada1 in this case):

[root@freenas] ~# gpart create -s gpt ada1

Then, you want to create a partition of a specific size (in this case, the size of my 2 TB drives):

[root@freenas] ~# gpart add —t freebsd-zfs -s 2000398934016b ada1

(Note the ‘b’ after the number, to indicate the unit is bytes.)

You can add further partitions by repeating the command (in this case, the size of my 1 TB drive):

[root@freenas] ~# gpart add —t freebsd-zfs -s 1000204886016b ada1

That’s it, your partitions are created. To inspect them, use gpart show:

[root@freenas] ~# gpart show ada1
=>        34  5860533101  ada1  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40  1950351360     1  freebsd-zfs  (930G)
  1950351400  3907029088     2  freebsd-zfs  (1.8T)
  5857380488     3152647        - free -  (1.5G)

If you make a mistake and need to start again, you can remove the partitions and the partition table:

[root@freenas] ~# gpart delete -i 1 ada1
[root@freenas] ~# gpart delete -i 2 ada1
[root@freenas] ~# gpart destroy ada1

Creating a FreeNAS ZFS volume

Once you’ve got your partitions set up, you can create a ZFS pool (volume). There’s one fly in the ointment: by default, FreeBSD mounts ZFS pools at the root of the filesystem, but FreeNAS mounts them under /mnt. So, you need to tell ZFS where the mount point is using the -m flag to zpool:

[root@freenas] ~# zpool create -m /mnt/data data raidz ada2 ada3 ada1p1 ada4p1

This is actually doing two things in one command: creating a RAIDZ virtual device (made up of ada2, ada3, ada1p1 and ada4p1), and then creating a zpool containing just that vdev. If you know a bit about ZFS you’ll know that a zpool can actually contain multiple vdevs, so you might be tempted to create several vdevs using partitions and then put them all in a single pool. But be aware that ZFS stripes data across vdevs, and expects them to be independently resilient; if you put two separate vdevs using different partitions of the same disk into a zpool and the disk fails, you’ll lose all your data.

FreeNAS won’t be able to see this new pool by default; it maintains its own state information, rather than probing the OS. To make the new pool visible in FreeNAS, you first need to “export” it (exporting/importing is how you move ZFS volumes between systems):

[root@freenas] ~# zpool export data

Finally, use the GUI to auto-import the volume:

  • Go to Storage > Volumes > View Volumes
  • Click the Auto Import Volume button
  • Your volume’s not encrypted, so select No when asked
  • Your volume should appear in the drop-down; click ok

And that’s it – your volume appears in the list of volumes, and you’re set to go.

Recently, we were looking for a server for a CPU intensive, single-threaded website. We needed lots of cores, for simultaneous load (traffic spikes), but also fast CPUs if possible.

We had to choose between a faster machine with 16 cores, or a slower one with 32 cores. The latter machine was marginally slower per core than our existing server; however, we learnt that the Opteron CPUs it used had AMD “Turbo core” support.

Turbo core is a technology whereby the CPU, if it detects that there is high load on some cores, will shut down unused cores and boost the ones in use, provided the temperature of the CPU remains within tolerance. This requires OS support, and the Linux kernel has had support since early 2010. Phoronix has some Linux benchmarks if you’re interested.

That tipped it for us, because with Turbo Core enabled, the 32-way machine would be faster under light load than our existing server, while also providing breadth for spikes.

However, once we’d commissioned the machine, our tests didn’t show turbo core being used. (We tested by monitoring CPU status files in /proc/ while simulating high web load using wget.) After some head scratching, poking around in dmesg, checking the BIOS settings etc, we discovered that CentOS 6.4, released in February 2013, uses a kernel from late 2009 (2.6.32) which would suggest it doesn’t support turbo core (available from 2.6.35).

However, that didn’t necessarily mean Turbo Core wasn’t supported – the version of the 2.6.32 kernel CentOS 6.4 uses has had many revisions (358 according to Wikipedia).

The way to check definitively is firstly to look in /var/log/dmesg for this boot message:

powernow-k8: Core Performance Boosting: on

Secondly, install the cpufrequtils package and run cpufreq-aperf, which monitors core speeds more accurately. When you then put the system under load, you’ll see whether CPU frequency goes above the rated speed of the CPU, i.e. into turbo mode. In our case, with CPUs rated at 2.1 GHz, we saw speeds in the region of 2.8 GHz:

CPU   Average freq(KHz)   Time in C0      Time in Cx              C0 percentage 

021   2835000             01 sec 000 ms   8784163844 sec 623 ms   100
%d bloggers like this: