Monday, January 30, 2006
Thursday, January 26, 2006
Tim Foster's Weblog
ZFS, Is that it ?
The title of this post sounds a bit negative at first, but bear with me - I'm just getting going, and all will become clear.
One of the first things I was asked to do, working here in the ZFS test group, was to see if we could do things that would show ZFS playing well with others.
In ZFS, the general administrative model is that you first create one or more storage pools out of some storage devices you have available, choosing what level of performance or redundancy features you'd like to employ. Then once you have a storage pool, you can then carve it up into file systems, using quotas or reservations, setting properties as you see fit. If you really want, you can also create Volumes from the storage pool - thus providing traditional UNIX block or character devices from the total amount of space available in that pool (and of course, if you want to add more storage to that pool at any time, you can - everything created from that pool will then appear larger : another of ZFS's Party Tricks, imho)
Talking about storage pools though, who's to say I have to create a pool using just a bunch of disks - what would happen if I tried to create a storage pool out of a Veritas Volume, for example ?
I'd never played with Veritas before - and was a little worried (I'd heard the horror stories).
One of ZFS's reasons for being, is to show the world that it doesn't have to be this way - as a two-week newbie to the world of ZFS, I was starting to take it's ease of use for granted. This was to turn out to be a bit of a wake up call.
At this point, Google has probably already told you this, but I thought I'd point out Ben's excellent "Veritas Krash Kourse". Thanks Ben, this saved me a lot of time ! By consulting that article, here's what I had to do to set up a simple volume containing three disks :
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@1f,4000/scsi@3/sd@0,0
1. c1t1d0
/pci@1f,4000/scsi@3,1/sd@1,0
2. c1t2d0
/pci@1f,4000/scsi@3,1/sd@2,0
3. c1t3d0
/pci@1f,4000/scsi@3,1/sd@3,0
4. c1t4d0
/pci@1f,4000/scsi@3,1/sd@4,0
5. c1t5d0
/pci@1f,4000/scsi@3,1/sd@5,0
6. c1t6d0
/pci@1f,4000/scsi@3,1/sd@6,0
Specify disk (enter its number): ^C
# vxdiskadm
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
22 Change/Display the default disk layouts
23 Mark a disk as allocator-reserved for a disk group
24 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 1
Add or initialize disks
Menu: VolumeManager/Disk/AddDisks
Use this operation to add one or more disks to a disk group. You can
add the selected disks to an existing disk group or to a new disk group
that will be created as a part of the operation. The selected disks may
also be added to a disk group as spares. Or they may be added as
nohotuses to be excluded from hot-relocation use. The selected
disks may also be initialized without adding them to a disk group
leaving the disks available for use as replacement disks.
More than one disk or pattern may be entered at the prompt. Here are
some disk selection examples:
all: all disks
c3 c4t2: all disks on both controller 3 and controller 4, target 2
c3t4d2: a single disk (in the c#t#d# naming scheme)
xyz_0 : a single disk (in the enclosure based naming scheme)
xyz_ : all disks on the enclosure whose name is xyz
Select disk devices to add: [,all,list,q,?] c1t4d0 c1t5d0 c1t6d0
Here are the disks selected. Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Continue operation? [y,n,q,?] (default: y)
You can choose to add these disks to an existing disk group, a
new disk group, or you can leave these disks available for use
by future add or replacement operations. To create a new disk
group, select a disk group name that does not yet exist. To
leave the disks available for future use, specify a disk group
name of "none".
Which disk group [,none,list,q,?] (default: default)
There is no active disk group named default.
Create a new group named default? [y,n,q,?] (default: y)
Create the disk group as a CDS disk group? [y,n,q,?] (default: y) n
Use default disk names for these disks? [y,n,q,?] (default: y)
Add disks as spare disks for default? [y,n,q,?] (default: n)
Exclude disks from hot-relocation use? [y,n,q,?] (default: n)
A new disk group will be created named default and the selected disks
will be added to the disk group with default disk names.
c1t4d0 c1t5d0 c1t6d0
Continue with operation? [y,n,q,?] (default: y)
The following disk devices appear to have been initialized already.
The disks are currently available as replacement disks.
Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Use these devices? [Y,N,S(elect),q,?] (default: Y)
The following disks you selected for use appear to already have
been initialized for the Volume Manager. If you are certain the
disks already have been initialized for the Volume Manager, then
you do not need to reinitialize these disk devices.
Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Reinitialize these devices? [Y,N,S(elect),q,?] (default: Y)
Do you want to use the default layout for all disks being initialized?
[y,n,q,?] (default: y)
Initializing device c1t4d0.
Initializing device c1t5d0.
Initializing device c1t6d0.
VxVM NOTICE V-5-2-120
Creating a new disk group named default containing the disk
device c1t4d0 with the name default01.
VxVM NOTICE V-5-2-88
Adding disk device c1t5d0 to disk group default with disk
name default02.
VxVM NOTICE V-5-2-88
Adding disk device c1t6d0 to disk group default with disk
name default03.
Add or initialize other disks? [y,n,q,?] (default: n)
Select an operation to perform: q
# vxdg free
GROUP DISK DEVICE TAG OFFSET LENGTH FLAGS
default default01 c1t4d0 c1t4d0 0 35359984 -
default default02 c1t5d0 c1t5d0 0 35359984 -
default default03 c1t6d0 c1t6d0 0 35360016 -
# vxassist make defaultvol 106079984
Holy Crap! - I'm now old and grey, my fingers are tired and my brain hurts. I'm not even sure whether the Volume I've created allows me to grow, replace or swap disks, whether I can move these disks across architectures or whether I can add disks to this Volume, I don't know what level of redundancy I've got, or what happens if a disk fails. And I haven't even started to worry about filesystems yet - Yikes!
Here's what I needed to do on ZFS to create a similar thing (a block/character interface consisting of three disks)
# zpool create default c1t4d0 c1t5d0 c1t6d0
# zfs create -V [Size]g defaultvol
Two commands. Much better.
After that first zpool command above, I had a directory on my system called /default which had as much space available as the three disks I added together. I could immediately start storing files in that directory - if you don't want to create a file system in that pool, no problem - you don't have to; if you feel like it, you can then use the zfs command to create some file systems and further set quotas or reservations in those child file systems. Oh, there's also no need to go messing about in /etc/vfstab or /etc/dfs/dfstab either, ZFS will manage all of the mounting, NFS sharing and will allow you to set an alternative mountpoint if you like. (if you really feel the need, you can revert back to using vfstab - ZFS is flexible)
The second command, zfs create -V creates a volume of size [Size] Gb (growable and shrinkable up to the total amount of space in the pool as often as you like) that appears in /dev/zvol/dsk/default/defaultvol and /dev/zvol/rdsk/default/defaultvol.
But I digress - the thing I was interested in seeing, having (eventually) created that Veritas Volume, was could I use it to create a ZFS Storage Pool along with a few other disks ? Sure I could :
# zpool create testpool /dev/vx/dsk/default/defaultvol c1t1d0 c1t2d0 c1t3d0
# zfs create testpool/timfs
# cd /testpool/timfs
# df -h .
Filesystem size used avail capacity Mounted on
testpool/timfs 100G 8K 100G 1% /testpool/timfs
# for i in 1 2 3 4
> do
> mkfile 10G testfile-${i}
> echo $i created
> done
1 created
2 created
3 created
4 created
# ls -al
total 83890042
drwxr-xr-x 2 root sys 6 Oct 27 10:25 .
drwx--x--x 3 root root 512 Oct 27 09:14 ..
-rw------T 1 root root 10737418240 Oct 27 09:46 testfile-1
-rw------T 1 root root 10737418240 Oct 27 09:58 testfile-2
-rw------T 1 root root 10737418240 Oct 27 10:10 testfile-3
-rw------T 1 root root 10737418240 Oct 27 10:23 testfile-4
# df -h .
Filesystem size used avail capacity Mounted on
testpool/timfs 100G 40G 60G 41% /testpool/timfs
Of course, the question remains, since I've already shown that I don't need Veritas anymore, why would I bother keeping my Veritas license... ?
Now of course, I'm greatly paraphrasing here: people more versed in the ways of Veritas will tell me there's a lot more to their volume manager and file system than that, and if I really want to, I can condense all of that output above to some, shorter, simpler commands - but really, I think this says it all:
# cd /opt/VRTS/bin
# /usr/bin/ls | wc -l
121
# cd /usr/sbin
# /usr/bin/ls zpool zfs | wc -l
2
From a new user's point of view, ZFS really is incredibly simple to use and from a more advanced user's point of view, it still allows for as much fine-grained administration as you're likely to need.
See? ZFS: that's it.
(2005-11-16 09:13:42.0) Permalink Comments [3]
Trackback URL: http://blogs.sun.com/roller/trackback/timf/Weblog/zfs_is_that_it
Comments:
Hey Tim,
congrats to you and the whole ZFS-Team.
Keep up the great work.
I was speechless when I read that ZFS made it into Build 27.
What an exciting week.
Wow. ;)
Patrick
Posted by Patrick Bachmann (87.122.243.219) on November 17, 2005 at 12:00 AM GMT #
Thanks Patrick! It certainly has been pretty exciting alright, all this and I'm only a few weeks into the new job (lucky me!) It's really the rest of the guys on the team who've deserve the praise though, I was just lucky enough to be able to join in a bit towards the very end. Roll on future enhancements to ZFS :-)
Posted by Tim Foster on November 17, 2005 at 09:32 PM GMT
Website: http://blogs.sun.com/timf #
Wow, VXVM seems to have gotten even more complex, I don't remember having to that much work to create volumes. I still have to reach for the VXVM manuals when I want to replace a failed disk; more often than not if the filesystem isnt important I'll just trash it and build a new volume...
This complexity was one of the main things that put me off VXVM. With SVM (Disksuite in olden times) you at least had only a handful of commands with only a few options. Disksuite also let you manipulate the bits of the disk you wanted, so for a 73Gb disk of which only 10Gb was the root filesystem; you can mirror just the 10Gb, and use the rest of the disk as you liked.
Hopefully I'll get around to looking at ZFS sometime soon.
By the way I should have some veritas manuals in the office somewhere if you want to find out what the other 118 commands do :-)
Posted by Albert White on November 18, 2005 at 12:38 AM GMT
Website: http://blogs.sun.com/roller/page/albertw #
ZFS, Is that it ?
The title of this post sounds a bit negative at first, but bear with me - I'm just getting going, and all will become clear.
One of the first things I was asked to do, working here in the ZFS test group, was to see if we could do things that would show ZFS playing well with others.
In ZFS, the general administrative model is that you first create one or more storage pools out of some storage devices you have available, choosing what level of performance or redundancy features you'd like to employ. Then once you have a storage pool, you can then carve it up into file systems, using quotas or reservations, setting properties as you see fit. If you really want, you can also create Volumes from the storage pool - thus providing traditional UNIX block or character devices from the total amount of space available in that pool (and of course, if you want to add more storage to that pool at any time, you can - everything created from that pool will then appear larger : another of ZFS's Party Tricks, imho)
Talking about storage pools though, who's to say I have to create a pool using just a bunch of disks - what would happen if I tried to create a storage pool out of a Veritas Volume, for example ?
I'd never played with Veritas before - and was a little worried (I'd heard the horror stories).
One of ZFS's reasons for being, is to show the world that it doesn't have to be this way - as a two-week newbie to the world of ZFS, I was starting to take it's ease of use for granted. This was to turn out to be a bit of a wake up call.
At this point, Google has probably already told you this, but I thought I'd point out Ben's excellent "Veritas Krash Kourse". Thanks Ben, this saved me a lot of time ! By consulting that article, here's what I had to do to set up a simple volume containing three disks :
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@1f,4000/scsi@3/sd@0,0
1. c1t1d0
/pci@1f,4000/scsi@3,1/sd@1,0
2. c1t2d0
/pci@1f,4000/scsi@3,1/sd@2,0
3. c1t3d0
/pci@1f,4000/scsi@3,1/sd@3,0
4. c1t4d0
/pci@1f,4000/scsi@3,1/sd@4,0
5. c1t5d0
/pci@1f,4000/scsi@3,1/sd@5,0
6. c1t6d0
/pci@1f,4000/scsi@3,1/sd@6,0
Specify disk (enter its number): ^C
# vxdiskadm
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
22 Change/Display the default disk layouts
23 Mark a disk as allocator-reserved for a disk group
24 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 1
Add or initialize disks
Menu: VolumeManager/Disk/AddDisks
Use this operation to add one or more disks to a disk group. You can
add the selected disks to an existing disk group or to a new disk group
that will be created as a part of the operation. The selected disks may
also be added to a disk group as spares. Or they may be added as
nohotuses to be excluded from hot-relocation use. The selected
disks may also be initialized without adding them to a disk group
leaving the disks available for use as replacement disks.
More than one disk or pattern may be entered at the prompt. Here are
some disk selection examples:
all: all disks
c3 c4t2: all disks on both controller 3 and controller 4, target 2
c3t4d2: a single disk (in the c#t#d# naming scheme)
xyz_0 : a single disk (in the enclosure based naming scheme)
xyz_ : all disks on the enclosure whose name is xyz
Select disk devices to add: [,all,list,q,?] c1t4d0 c1t5d0 c1t6d0
Here are the disks selected. Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Continue operation? [y,n,q,?] (default: y)
You can choose to add these disks to an existing disk group, a
new disk group, or you can leave these disks available for use
by future add or replacement operations. To create a new disk
group, select a disk group name that does not yet exist. To
leave the disks available for future use, specify a disk group
name of "none".
Which disk group [,none,list,q,?] (default: default)
There is no active disk group named default.
Create a new group named default? [y,n,q,?] (default: y)
Create the disk group as a CDS disk group? [y,n,q,?] (default: y) n
Use default disk names for these disks? [y,n,q,?] (default: y)
Add disks as spare disks for default? [y,n,q,?] (default: n)
Exclude disks from hot-relocation use? [y,n,q,?] (default: n)
A new disk group will be created named default and the selected disks
will be added to the disk group with default disk names.
c1t4d0 c1t5d0 c1t6d0
Continue with operation? [y,n,q,?] (default: y)
The following disk devices appear to have been initialized already.
The disks are currently available as replacement disks.
Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Use these devices? [Y,N,S(elect),q,?] (default: Y)
The following disks you selected for use appear to already have
been initialized for the Volume Manager. If you are certain the
disks already have been initialized for the Volume Manager, then
you do not need to reinitialize these disk devices.
Output format: [Device_Name]
c1t4d0 c1t5d0 c1t6d0
Reinitialize these devices? [Y,N,S(elect),q,?] (default: Y)
Do you want to use the default layout for all disks being initialized?
[y,n,q,?] (default: y)
Initializing device c1t4d0.
Initializing device c1t5d0.
Initializing device c1t6d0.
VxVM NOTICE V-5-2-120
Creating a new disk group named default containing the disk
device c1t4d0 with the name default01.
VxVM NOTICE V-5-2-88
Adding disk device c1t5d0 to disk group default with disk
name default02.
VxVM NOTICE V-5-2-88
Adding disk device c1t6d0 to disk group default with disk
name default03.
Add or initialize other disks? [y,n,q,?] (default: n)
Select an operation to perform: q
# vxdg free
GROUP DISK DEVICE TAG OFFSET LENGTH FLAGS
default default01 c1t4d0 c1t4d0 0 35359984 -
default default02 c1t5d0 c1t5d0 0 35359984 -
default default03 c1t6d0 c1t6d0 0 35360016 -
# vxassist make defaultvol 106079984
Holy Crap! - I'm now old and grey, my fingers are tired and my brain hurts. I'm not even sure whether the Volume I've created allows me to grow, replace or swap disks, whether I can move these disks across architectures or whether I can add disks to this Volume, I don't know what level of redundancy I've got, or what happens if a disk fails. And I haven't even started to worry about filesystems yet - Yikes!
Here's what I needed to do on ZFS to create a similar thing (a block/character interface consisting of three disks)
# zpool create default c1t4d0 c1t5d0 c1t6d0
# zfs create -V [Size]g defaultvol
Two commands. Much better.
After that first zpool command above, I had a directory on my system called /default which had as much space available as the three disks I added together. I could immediately start storing files in that directory - if you don't want to create a file system in that pool, no problem - you don't have to; if you feel like it, you can then use the zfs command to create some file systems and further set quotas or reservations in those child file systems. Oh, there's also no need to go messing about in /etc/vfstab or /etc/dfs/dfstab either, ZFS will manage all of the mounting, NFS sharing and will allow you to set an alternative mountpoint if you like. (if you really feel the need, you can revert back to using vfstab - ZFS is flexible)
The second command, zfs create -V creates a volume of size [Size] Gb (growable and shrinkable up to the total amount of space in the pool as often as you like) that appears in /dev/zvol/dsk/default/defaultvol and /dev/zvol/rdsk/default/defaultvol.
But I digress - the thing I was interested in seeing, having (eventually) created that Veritas Volume, was could I use it to create a ZFS Storage Pool along with a few other disks ? Sure I could :
# zpool create testpool /dev/vx/dsk/default/defaultvol c1t1d0 c1t2d0 c1t3d0
# zfs create testpool/timfs
# cd /testpool/timfs
# df -h .
Filesystem size used avail capacity Mounted on
testpool/timfs 100G 8K 100G 1% /testpool/timfs
# for i in 1 2 3 4
> do
> mkfile 10G testfile-${i}
> echo $i created
> done
1 created
2 created
3 created
4 created
# ls -al
total 83890042
drwxr-xr-x 2 root sys 6 Oct 27 10:25 .
drwx--x--x 3 root root 512 Oct 27 09:14 ..
-rw------T 1 root root 10737418240 Oct 27 09:46 testfile-1
-rw------T 1 root root 10737418240 Oct 27 09:58 testfile-2
-rw------T 1 root root 10737418240 Oct 27 10:10 testfile-3
-rw------T 1 root root 10737418240 Oct 27 10:23 testfile-4
# df -h .
Filesystem size used avail capacity Mounted on
testpool/timfs 100G 40G 60G 41% /testpool/timfs
Of course, the question remains, since I've already shown that I don't need Veritas anymore, why would I bother keeping my Veritas license... ?
Now of course, I'm greatly paraphrasing here: people more versed in the ways of Veritas will tell me there's a lot more to their volume manager and file system than that, and if I really want to, I can condense all of that output above to some, shorter, simpler commands - but really, I think this says it all:
# cd /opt/VRTS/bin
# /usr/bin/ls | wc -l
121
# cd /usr/sbin
# /usr/bin/ls zpool zfs | wc -l
2
From a new user's point of view, ZFS really is incredibly simple to use and from a more advanced user's point of view, it still allows for as much fine-grained administration as you're likely to need.
See? ZFS: that's it.
(2005-11-16 09:13:42.0) Permalink Comments [3]
Trackback URL: http://blogs.sun.com/roller/trackback/timf/Weblog/zfs_is_that_it
Comments:
Hey Tim,
congrats to you and the whole ZFS-Team.
Keep up the great work.
I was speechless when I read that ZFS made it into Build 27.
What an exciting week.
Wow. ;)
Patrick
Posted by Patrick Bachmann (87.122.243.219) on November 17, 2005 at 12:00 AM GMT #
Thanks Patrick! It certainly has been pretty exciting alright, all this and I'm only a few weeks into the new job (lucky me!) It's really the rest of the guys on the team who've deserve the praise though, I was just lucky enough to be able to join in a bit towards the very end. Roll on future enhancements to ZFS :-)
Posted by Tim Foster on November 17, 2005 at 09:32 PM GMT
Website: http://blogs.sun.com/timf #
Wow, VXVM seems to have gotten even more complex, I don't remember having to that much work to create volumes. I still have to reach for the VXVM manuals when I want to replace a failed disk; more often than not if the filesystem isnt important I'll just trash it and build a new volume...
This complexity was one of the main things that put me off VXVM. With SVM (Disksuite in olden times) you at least had only a handful of commands with only a few options. Disksuite also let you manipulate the bits of the disk you wanted, so for a 73Gb disk of which only 10Gb was the root filesystem; you can mirror just the 10Gb, and use the rest of the disk as you liked.
Hopefully I'll get around to looking at ZFS sometime soon.
By the way I should have some veritas manuals in the office somewhere if you want to find out what the other 118 commands do :-)
Posted by Albert White on November 18, 2005 at 12:38 AM GMT
Website: http://blogs.sun.com/roller/page/albertw #
Wednesday, January 25, 2006
ZFS and Solaris 10 1/06 - c0t0d0s0.org: "I�ve got several hits from Google with queries like ' zfs solaris 1/06'. ZFS is not part of Solaris 10 1/06. You have to use Opensolaris for experimenting with ZFS. My personal guess it will be integrated in the next(unprobable) or the following(more probable) upgrade. But it�s a filesystem. Before we can release it with the production version it has to be really,really stable.
"
"