In my last post “IBM DS3400 Redundant Controllers and Bad Batteries, eNet Cable Fail” I realized that I badly configured my SAN from the start. So I bit the bullet and started a process to change the number of spindles per LUN to 11 of 12 disks, with the 12 disk being a hot spare. Performance on SAN LUN is directly proportional to the number of spindles in use by the RAID set and my old setup had 3 Disk LUNs instead of using virtual LUNs ontop of one larger physical LUN.
Now that I know how to configure this, I wanted to make use of the higher performance. To do this, I had to
- first offload all VMs from the SAN to some other storage by way of SVMotion
- backup any RDMs I have
- update the LUN layout of the IBM DS3400 to be 11 disks in a Raid 5 with virtual LUNs presented to ESX
- SVMotion the VMs to the new LUNs
The tasks seem simple enough, but first I had to find some available storage. Given that I just got this lovely brand new Iomega IX2 with a TB of available space, I decided to give it a shot. It supports CIFS, NFS, and iSCSI.
First I tried NFS. That did not work. So then I tried iSCSI. After it took over an hour to build a 500GB LUN, i tried to present it to ESX. ESX did not recognize the device. Perhaps it was the iSCSI implementation so I tried to see the device using Linux using the following command:
iscsiadm -m discovery -t sendtargets -p ipAddress
That produced no results. So I rebooted the device. This allowed the above command to work but ESX would still not recognize the device.
UPDATE: The problem is NOT with the iSCSI implementation but the fact that the iSCSI Network is not the one the IX2 was actually on, once this was found, all worked as expected. However, too late to be used as a temporary LUN for the rearranging my SAN, unfortunately. NFS also works once the IP address was correct on the IX2, it definitely speak NFSv3 over TCP.
So on to plan B. Which was my recently increased local disk space to hold all my VMs just in case my SAN had issues. Storage found onto step 2.
Backup of the only RDM I have was per normal means, I generally back this up to another server that currently has 6TBs of disk within it. 2TBs planned as a mirrored OS, 2 TBs for iSCSI, and 2TBs for general storage. This server speaks CIFS, NFS, and iSCSI as well but iSCSI and NFS was not quit working at the time I choose to proceed. Even so, I had to carve up the space per my design just a bit early, but that was trivial using the Linux command system-config-lvm.
SVmotion continued but took a few days to complete. For powered off VMs I choose to use the cold migrate method over any other as I just wanted to use one tool (vCenter).
I am always amazed that vCenter can SVMotion itself. I was even able to SVMotion my file server with the attached RDM pretty easily.
Destroying the existing volumes on the SAN and restructuring the device was trivial. I am above all looking for performance as well as redundancy. So I created a 11 Disk RAID-5 physical LUN with a single hot-spare. On top of this physical LUN, I created 3 Logical Volumes I then presented as distinct LUNs to the hosts. Each of these logical LUNs was 512GBs in size (leaving plenty of space on the SAN for future growth). In addition, each of the logical LUNs was presented and zoned through a different controller, this way I can make use of both controllers in this active, active SAN. I labeled them, ESXPRIMARY, ESXSECONDARY, and FSRDM. This way I am not confused going forward. Everyone needs a naming convention.
SVMotion and cold migration of the VMs to the newly created LUNs presented no issues except for two VMs.
The first was my fileserver. Since this machine was powered off during all of this due to the deletion of the RDM. It was cold migrated and the older RDM was deleted, and the new one put into place. The file server is a Linux system so adding the RDM just took use of system-config-lvm and the rsync command to restore the files.
The second server that had issues was my vCenter server. I had forgotten that I had given it a single small RDM as an experiment with NPIV. This experiment was an attempt to get the IBM DS Software to see the controllers through the use of NPIV. It failed, but I never deleted the RDM/NPIV configuration. Since this did not exist, the SVMotion failed repeatedly.
Fixing vCenter Server
To fix the initial problem of vCenter not migrating I attempted a simple reboot, but Windows 2008 R2 just hung on boot. Thinking there was something a little more serious I investigated the vmx files and the associated vmware.log and noticed the RDM configuration. Once that was deleted by attaching the vSphere Client directly to the host the VM would boot. A subsequent SVMotion worked but left the vCenter in an odd state.
The odd state was that vCenter now appeared on two different datastores, the new LUN and the temporary home of the VM. This was very confusing as the temporary home showed no files! So I attempted to VMotion the VM as that sometimes fixes these discrepancies but that failed as the temporary space was the local storage. But I got another clue, it said it was not possible due to a linked clone. Since Linked Clones are part of SVMotion this made sense. vCenter basically had some incorrect information.
The solution was to power off the vCenter VM, connect to the host using the vSphere Client directly, unregister the vCenter VM, and then re-register the VM (which I actually performed from the command line). Once that was done, I rebooted vCenter and it no longer showed up on both LUNs.
Simple yet straight forward at the start, time consuming to perform, but when problems occur be prepared to delve into log files to trouble shoot and use single host mechanisms to fix. I would like to thank all those that gave me ideas on twitter @ChrisDearden, @JasonBoche, @JimPeluso, etc.