vSphere Upgrade Saga: VSAN Try 2

In my previous vSphere Upgrade Saga post, VSAN Upgrade Woes, I discussed upgrade problems in a relatively unsupported configuration. I finally figured out why I had such a problem. It was not the unsupported nature of the configuration, but the disk space used within VSAN. In effect, my VSAN was heavily overcommitted, and as such, there was no room to move things around to allow updates. I needed a new implementation of VSAN, one that is supported.

The goal was to take my three HPE BL460c Gen8 Blades and make them all part of a single VSAN cluster. To do this, however, I needed to seriously rethink my disk partitioning layout. I needed a flash tier and a capacity tier within each node. At the time, the blades were configured with two 300GB 2.5 15K SAS drives. On two of the blades, my existing HP StoreVirtual VSA VMs lived on those drives and the other drives within the D2220SB storage blades.

Not wanting to purchase a third storage blade, I had to come up with another solution. The solution has been around for a number of blade iterations, but I had not jumped to its use. Actually, with Gen9 blades, there are three possible configurations, while Gen8 blades had two, and early blades had pretty much only one option.

The possible layouts:

Blade Type USB MicroSD Solid State M.2 Cache Tier Data Tier
Gen9 X X X2 SSD SSD/HDD
Gen8 X1 X SSD SSD/HDD
< Gen83 X1 SSD SSD/HDD

1 – USB 2.0; USB 3.0 has better power control/availability.
2 – Single and dual port configuration. Dual port is recommended.
3 – The storage controller in these blades needs to be upgraded to one that understands SSD, such as the 420i or the 220i. The default controllers will not work.

In essence, since I have Gen8 blades, the only true option is to use microSD with SSD for cache and either SSD or HDD for the data tier. I chose to use a 32GB HPE microSD, a 960GB Samsumg PM863 2.5″ SATA SSD for the cache tier, and a 1.9TB Samsumg PM863 2.5″ SATA SSD for the data tier, thereby delivering 5.4TBs of disk space in an all-flash configuration.

Putting It Together

Enabling VSAN is rather simple. It is putting the hardware together that is important. This is a complete reinstall, but I have running VMs, and it is important to keep them running. I took the following steps per blade, starting with the blade with no attached D2220SB:

  1. Evacuate the VMs from the blade
  2. Shut down the blade and remove from vCenter
  3. Remove the blade from the chassis
  4. Remove the disks from the front of the blade
  5. Add a 960GB SSD to slot 1 and a 1.92TB SSD to slot 2
  6. Plug in the 32GB microSD
  7. Close up the blade and put back in the chassis (a boot is automatic)
  8. Open up the iLO remote console once the blade power has been established
  9. Mount the HPE April 2016 custom ISO of ESXi 6.0 U2 via the iLO
  10. Ensure that the microSD can be seen (if not, then you will have to remove the blade and reseat the microSD)
  11. Install ESXi
  12. On reboot, mount the most recent Service Pack for ProLiant
  13. Upgrade all the firmware
  14. On reboot, ensure power settings and chipset features are correct for your environment
  15. Enter Disk Management and create two volumes on the controller on the system board (Smart Array P220i Controller) or controller 0 for the 960GB SSD and one for the 1.92TB SSD. You will have to remove any existing volumes by clearing the Smart Array. I was very careful to only do this to the Smart Array P220i Controller and not the Smart Array P420i within the D2220SBs.
  16. Boot into ESXi and add to existing vCenter

At this point, you have a blade booting ESXi off the microSD card with no local disks, yet I do have some FC disks automatically mounted to ESXi. I used one of the FC volumes as a location for the scratch/temporary file location required by ESXi to save the microSD from excessive writes. Further steps include:

  1. Create a directory named .locker-hostname on the FC volume of choice, using the browse datastore functionality.
  2. Set the scratch/temporary directory to point to that location. This is the ScratchConfig.ConfiguredScratchLocation advanced option. You will need the volume ID of the location, which is a hexadecimal number—not the naa. representation, but the GUID style representation, which can be found in the storage configuration. I set this to be /vmfs/volumes/GUID/.locker-hostname
  3. Reconfigure to pick up any iSCSI volumes. In one case, this required resetting the IQN of the iSCSI target on the host to be the original value. Otherwise, I would have had to reconfigure my iSCSI servers to use the new IQN. It was far faster and easier to adjust the host, as nothing was running on it yet.

Now, I did go and do this for the second blade, exactly as defined in all sixteen steps. However, that led to a major problem, as the disks within the blade held a powered off StoreVirtual VSA component. When the volumes were removed, I had not previously moved the VM to another volume using SVMotion (Yes, I forgot where it was located). So, in addition to the first sixteen steps and the following two steps, I had to recover my StoreVirtual VSA VM by reinstalling and hooking it up to the original to recover my RAIN-based StoreVirtual. A resync of the data would have taken thirty-four hours.

At this point, I had two hosts ready to configure an all-flash VSAN, and no VMs had gone down.

Before proceeding to the third blade, I had to recover the StoreVirtual VSA. Since I have plenty of storage not doing anything on another iSCSI server, the choice was to wait thirty-four hours or use SVMotion and recreate the StoreVirtual RAIN. I chose the latter. Storage VMotion sped this up to only two hours. Evacuating all the VMs from the StoreVirtual took an hour. I then destroyed the volume and recreated it using the StoreVirtual VSAs. Once that was done, all virtual machines were once more moved back to the StoreVirtual VSA.

Alas, a simple mistake, not to be repeated on the third blade. That VM was moved to a volume resting on the D2220SB itself. (But not part of the StoreVirtual volume. That would have been a chicken-and-egg thing.)

After the third blade was reconfigured, the enabling of VSAN was anticlimatic. A few clicks of the buttons, and voilà, VSAN was configured.

Lastly, I was able to move my VMware Horizon View desktops to a 5.4TB VSAN datastore. Plenty of space for upgrades and other VMs.

Edward Haletky

Edward L. Haletky, aka Texiwill, is an author, analyst, developer, technologist, and business owner. Edward owns AstroArch Consulting, Inc., providing virtualization, security, network consulting and development and TVP Strategy where he is also an Analyst. Edward is the Moderator and Host of the Virtualization Security Podcast as well as a guru and moderator for the VMware Communities Forums, providing answers to security and configuration questions. Edward is working on new books on Virtualization.

Leave a Reply

Your email address will not be published. Required fields are marked *

17 − ten =