Stuff Nobody Cares About

The soul...
The feeds...
The links...
Thursday Feb 01, 2007

Solaris MPxIO and the stone-age SAN

Here's some things I've learned from my trials with trying to get MPxIO working with our SAN.  A lot of this stuff doesn't appear to be obvious to me, being new to MPxIO and SANs in the bargain.

  • In order for Solaris MPxIO to work with "a random SAN", your SAN must be a symmetric
    SAN, often called an "active/active" SAN.
  • If your SAN is asymmetric, or "active/passive", then your SAN must be supported. There's no guesswork at this stuff.  On Solaris 10 11/06 release or beyond, there should be an "mpathadm" command available.  The command:

mpathadm show mpath-support libmpscsi_vhci.so

will show a list of SANs supported.  If your SAN isn't in there, and your SAN vendor isn't quick with a suggestion (that is, if they say something along the lines of "Let us check into it") then it's very likely you're out of luck.

  • A compatible SAN will support the T10 standard, or "Target Port Group", sometimes called ALUA (Asymmetric Logical Unit Addressing).
After a ridiculous amount of work, we've determined that the IBM FASTt500 SAN is not compatible with Solaris MPxIO, at least on the T2000 platform.  It's very easy to fool yourself into thinking you're making progress.

Many people online will advise you to add an entry to /kernel/drv/scsi_vhci.conf, tricking Solaris into activating MPxIO on your devices.  It will seem to work... Solaris will remove the devices associated with the multiple paths to a LUN, and merge them together into the logical devices you want.  Sometimes, those devices will actually work. But it's a random thing. What isn't obvious to the newcomer is that simply by putting that line in the scsi_vhci.conf, you're declaring that your SAN is symmetric, even if it isn't. (Some people - who aren't me, apparently - will pick up on the symmetric-option bit in the conf file and realize what the file is actually telling MPxIO...)

luxadm display <dev> will show the improper thing... namely, that Solaris thinks both paths are PRIMARY and ONLINE. That's not what you want with an asymmetric SAN, obviously. One of the paths should be SECONDARY and STANDBY.

If you try to use the logical device and it works, chances are you're just getting lucky, and you've got a single controller, or your fabric running through a single switch. Deversify your fabric across switches and across controllers, and check your preferred paths - which is presumably what you'd want to do for a high-availability setup - and you might start finding that your LUNs respond to inquiry commands, but any attempt to label or otherwise talk to the disks starts generating SCSI errors.  (On an IBM SAN, you'll get 0x94 SCSI error codes... which stands for "Not preferred path"...)

Anyway, that was our experience.  Hopefully this will save someone some pain in the future.  As always, if in doubt, harrass the SAN vendor and Sun.  And, as always, "We're not sure" is probably just another way of saying, "Hah!  Fat chance!"

Saturday Jan 27, 2007

Auto Assault Ho

About to install Auto Assault.  This should be interesting, in the same way as an auto accident is interesting... which, if you think about it, is an entirely appropriate simile.