FreeNAS: creating ZFS zpools using partitions on mixed-sized disks

How to set up ZFS zpools using disk partitions and make them visible to FreeNAS

Update 22/01/2015: see also this stackoverflow post for an alternative scheme.

I’d outgrown my off-the-shelf NAS, ending up with files overflowing onto various travel disks and flash drives. My existing NAS wasn’t really something I could upgrade, so I resolved not to buy another proprietary device but rather to roll my own out of ordinary components, which might give me a bit more opportunity for expansion down the line.

Looking around for NAS software I settled on FreeNAS, which apart from seeming to be generally well-regarded, also gave me an excuse to dip my toe into FreeBSD land and to use ZFS, about which I’d heard good things.

Using your old disks

Like most of us whose storage needs have grown over the years I’ve accumulated a small stack of disks of various sizes. The general advice for RAID is to buy a bunch of disks of the same size, but I hate binning hardware if it’s still completely fine.

In total I had the following drives:

  • 2 TB x 2
  • 1 TB x 1

Since 3 and 4 TB disks are available and seem to be roughly the same $/GB as any other disk size, my plan was to buy one of each and arrange them and my old disks in a ZFS RAIDZ array as follows:

<--------------- RAID Z --------------->
+----------+  +----------+  +----------+
|   1 TB   |  |          |  |          |
+----------+  |   2 TB   |  |          |
|          |  |          |  |          |
|          |  +----------+  |   4 TB   |
|   3 TB   |  |          |  |          |
|          |  |   2 TB   |  |          |
|          |  |          |  |          |
+----------+  +----------+  +----------+

Unfortunately, while ZFS does do all manner of clever things, there’s one thing it doesn’t do: disk spanning or striping within a RAID set. In other words, you can’t stack your smaller drives and then treat them as a 4TB device within a RAID set as envisioned above.

ZFS does support striping across multiple disks (of different sizes if you like), but this gives you no redundancy. If you want to mirror or use RAIDZ, the disks need to be the same size.

BTW, another thing that ZFS apparently can’t do is expand an existing RAID array by adding a new disk. (You can replace a disk with a larger one, but not add an additional disk. Don’t ask me why; I’d have thought it’s just a case of adding the disk and then redistributing the data, even if that takes a month of Sundays. But they’re clever people, so it must be harder than it seems.) The upshot is, there’s no point aiming at 4 TB chunks with the idea that later on you can just buy another 4 TB disk and slot it into the array.

While you can’t mix disk sizes very easily, ZFS can work with partitions as well as whole disks. So I resolved to add 2 x 3TB drives and partition them to achieve two zpools in the following arrangement:

<---------------------- RAID Z --------------------->
+----------+  +----------+  +----------+ +----------+
|          |  |          |  |          | |          |
|   2 TB   |  |   2 TB   |  |   3 TB   | |   3 TB   |
|          |  |          |  |          | |          |
+----------+  +----------+  |..........| |..........| +----------+
                            |          | |          | |   1 TB   |
                            +----------+ +----------+ +----------+
                            <-------------- RAID Z -------------->

Unfortunately, FreeNAS doesn’t provide any disk partitioning capability in its GUI, and doesn’t see partitions when setting up ZFS volumes (it only lets you select whole disks). So, you’ll need to head to the command line.

Working out how big your disks are

When creating a ZFS RAIDZ virtual device (vdev), you’ll want to make sure all the underlying physical devices (whole disks or partitions) are exactly the same size. In my case, this meant that I needed to know how big my 1 TB and 2 TB drives were so I could partition my 3 TB drives correctly.

Firstly, to identify the BSD device names of your drives use the camcontrol command:

[root@freenas] ~# camcontrol devlist
<OCZ-AGILITY3 2.22>                at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD30EFRX-68EUZN0 80.00A80>    at scbus1 target 0 lun 0 (ada1,pass1)
<SAMSUNG HD204UI 1AQ10001>         at scbus2 target 0 lun 0 (ada2,pass2)
<SAMSUNG HD204UI 1AQ10001>         at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD30EFRX-68EUZN0 80.00A80>    at scbus4 target 0 lun 0 (ada4,pass4)
<ST31000520AS CC32>                at scbus5 target 0 lun 0 (ada5,pass5)
<Kingston DataTraveler SE9 PMAP>   at scbus7 target 0 lun 0 (pass6,da0)

The manufacturer names should be enough for you to identify what’s what.

To find out exactly how big a disk is, use the diskinfo command:

[root@freenas] ~# diskinfo -v ada2
    512             # sectorsize
    2000398934016   # mediasize in bytes (1.8T)
    3907029168      # mediasize in sectors
    4096            # stripesize
    0               # stripeoffset
    3876021         # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    S2H7J1CZB02790  # Disk ident.

The crucial number is the media size, bolded above. This tells you how big the drive really is.

Partitioning your drives

The gpart command is what you need to partition your disks. Although what I’m describing here is partitioning an empty drive, you can use it to re-partition non-destructively — but be very careful or make sure your data is backed up.

First, you need to set up the partition table on your drive (ada1 in this case):

[root@freenas] ~# gpart create -s gpt ada1

Then, you want to create a partition of a specific size (in this case, the size of my 2 TB drives):

[root@freenas] ~# gpart add —t freebsd-zfs -s 2000398934016b ada1

(Note the ‘b’ after the number, to indicate the unit is bytes.)

You can add further partitions by repeating the command (in this case, the size of my 1 TB drive):

[root@freenas] ~# gpart add —t freebsd-zfs -s 1000204886016b ada1

That’s it, your partitions are created. To inspect them, use gpart show:

[root@freenas] ~# gpart show ada1
=>        34  5860533101  ada1  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40  1950351360     1  freebsd-zfs  (930G)
  1950351400  3907029088     2  freebsd-zfs  (1.8T)
  5857380488     3152647        - free -  (1.5G)

If you make a mistake and need to start again, you can remove the partitions and the partition table:

[root@freenas] ~# gpart delete -i 1 ada1
[root@freenas] ~# gpart delete -i 2 ada1
[root@freenas] ~# gpart destroy ada1

Creating a FreeNAS ZFS volume

Once you’ve got your partitions set up, you can create a ZFS pool (volume). There’s one fly in the ointment: by default, FreeBSD mounts ZFS pools at the root of the filesystem, but FreeNAS mounts them under /mnt. So, you need to tell ZFS where the mount point is using the -m flag to zpool:

[root@freenas] ~# zpool create -m /mnt/data data raidz ada2 ada3 ada1p1 ada4p1

This is actually doing two things in one command: creating a RAIDZ virtual device (made up of ada2, ada3, ada1p1 and ada4p1), and then creating a zpool containing just that vdev. If you know a bit about ZFS you’ll know that a zpool can actually contain multiple vdevs, so you might be tempted to create several vdevs using partitions and then put them all in a single pool. But be aware that ZFS stripes data across vdevs, and expects them to be independently resilient; if you put two separate vdevs using different partitions of the same disk into a zpool and the disk fails, you’ll lose all your data.

FreeNAS won’t be able to see this new pool by default; it maintains its own state information, rather than probing the OS. To make the new pool visible in FreeNAS, you first need to “export” it (exporting/importing is how you move ZFS volumes between systems):

[root@freenas] ~# zpool export data

Finally, use the GUI to auto-import the volume:

  • Go to Storage > Volumes > View Volumes
  • Click the Auto Import Volume button
  • Your volume’s not encrypted, so select No when asked
  • Your volume should appear in the drop-down; click ok

And that’s it – your volume appears in the list of volumes, and you’re set to go.

  1. Philip Robar said:

    Thanks for writing this. It is exactly what I needed to kludge together a temporary setup until I can buy some new drives.

  2. Thanks for this post! In my case I had to specify the partition size in sectors and not in bytes.

  3. -s size should have sufx b (bytes) so comand should be
    gpart add -t freebsd-zfs -s 2000398934016b ada1

  4. Right you are, thanks. I’ve updated the commands in the post.

    As it happens I reconfigured my setup this weekend (replacing the 1TB disk with a 5TB one) and had to do some partitioning so I was able to confirm.

  5. Marco said:

    Hi, I like your thoughts to recombine an array with “old” disks. Over the years I replaced mine one by one, usually with a much higher capacity than the existing ones (5X500G, +2x 1T, +3x 1.5TB, +2TB, +3TB, +4TB, +6TB). So my available capacity grows with my need. The oldest ones got replaced in each step forward. Working still on the same pool as it grew.

    I like your “first” idea, but this is a no-go, as many users/experts would say not to. Then I came up with exactly your idea, but I fear, that the 3GB disks in your setup (I would have a similar setup) would need to head-seek a lot (high latency), as the striping of Raid-Z would kick in. The 1GB and the 2x2GB disks are OK, but the two 3GB are within both Raid-Z domains. As ZFS will distribute all data within the single pool, esp. those two disks are under heavy stress (for-back-jumping heads, >1/2 of full stroke).
    Can you confirm that behaviour? I was about to setup a linear array as you intended with your 1st thought (I already did this in the past 😦 ), A bit more complicated to maintain, but worked. Never had a good feeling but in theory there is nothing wrong with that. Why can’t ZFS just build a Raid-Z upon a “striped / enlarged” sub-array on its own. It’s a pity – spoken for home users that are on a budget, I guess. I have distributed backups but won’t rely on those. You don’t want to know… (cloud,obfuscating,encryption,fast-incremental re-syncing,…)

    Can you please share your findings of the real/instantiated setup of yours?

    • To be honest, I use my NAS for storing large, infrequently accessed data (movies and music, and backups) so performance has not been something I’ve looked into.

      The only time two arrays are really in use (and therefore the heads of the disks whose partitions are in both arrays would be jumping) is if I’m copying from one array to the other, or if a backup to one array is in progress while I’m watching a movie from the other array.

      But any disk hosting a filesystem whose files are distributed across the disk would presumably head seek a lot on random access, so I don’t think this setup is particularly anomalous in terms of head seek.

      In any event I haven’t noticed any performance issues.

  6. Marco said:

    Thanks for your reply. I think that helps me a lot. I will try your setup. The only thing, I’m not yet confident with is, what the striping will do on the crossing-over between the both raid-z partitions, as zfs might try to distribute all data, I will put into that pool later (evenly?, just filling up partition1 then partition2 would be awesome for my backup solution).

    * 1st setup of yours: you would have 8TB of storage space, that provides a zpool in a whole to all zfs filesystems on top of it
    * 2nd setup: you have eff. 6+2TB, but if I understand it correctly in two separate zpools now. Managing all available space on that raid setup falls partly in your hands.

    * my setup: what I would like (would work, but might put heavy stress on the disks / that head-seeking stuff), as it is nested on the zpool layer, what is super for the block mapping, cause the one and only zpool (again 8TB) will provide for each zfs on top

    disk a-2.5 b-2.5 c-2.5 d-1.5 e-1.5
    P1 1.5 1.5 1.5 1.5 1.5 -> raidz 6TB
    P2 1 1 1 -> raidz 2TB -> both striped -> 8TB in one pool

    tank –> 8TB
    raidz1 – 6TB
    raidz1 – 2TB

    I think, zfs will distribute all data, but p1-p2 of disks a,b,c are now far apart by partitioning nature 😦 (seeking >1/2 stroke). Or am I totally wrong here? But I look into that. If I would go with device-mapper (“linear” setup, not raid0 for disk d+e) underneath, the scheme is more like: (just simplified as I have 6+4+3+1.5+1.5)

    disk a-2.5 b-2.5 c-2.5 d-(1.5+1.5)-e
    P1 2.5 2.5 2.5 2.5 -> raidz 7.5TB

    (de) disk – would only lead to a dedicated access to disk d or e (raid0 could limit the actions on those two drives, thus reduce to 1/2 of head distance to go, but now simultaneous and better bandwidth)
    The rest of the available capacity (again in partitions a,b,c has some more to offer) is a non raid stripe, that goes into a separate zpool (like your setup, then) – but, all data going in there has an external 1:1 copy/backup, that I could easily re-sync; the (tank/raidz) is a remote copy of all important stuff, to make it a spatially backup (fire,theft – you know)

    Thanks again. It’s cool to talk to an expert, who knows his stuff. ZFS is cool and the only FS that never made any problems in term of faulty hardware or corrupt data.
    Any last thoughts on your side? Thumbs up for your working setup 🙂

    • Marco: The type of set up mentioned in this article made sense back in the day when drives were expensive and much smaller. Here a two much simpler solutions: 1) Sell the little drives. Buy either a 4 & 6 TB drive, or mirror the 3 & 4 and buy a 6TB. Mirror them appropriately and you’ll have 9 or 10 TB of storage. 2) Mirror the 6 & 4, 3 & 2, 3 x 1.5, 2 x 1, and 2 (2 x 500) for 9.5 TB. Either way you come out with more storage, better performance and a much simpler expansion path.

      If you go the latter path of using all your existing drives i’d keep each mirror as a separate pool since pools can’t be shrunk.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: