vSphere Upgrade Saga: Blade Swap

My latest vSphere Upgrade Saga is about upgrading server hardware. I don’t mean simply replacing components or adding more memory—I mean full server replacements. This was not my first full server replacement, but it was by far the easiest. Perhaps it is now more a science than an experiment.

History

After all these years, my vSphere Upgrade Saga has become less of a saga than a well-planned operation. For any major upgrade, planning is crucial. The why of the upgrade is as important as the actual deed. I would like to note that I have upgraded server chassis from 2U servers to blades, complete storage systems, and various versions of VMware’s products. In all those upgrades—from version 2.0 through 6.7U3—I have not lost any data. I have not even gone down for any extended period of time. There has been no accidental downtime, either. This level of resiliency is impressive!

Granted, there have been issues with upgrades. There have been issues with my systems. But no data loss! Of course, before every upgrade, you should answer my favorite question prior to proceeding:

Are you satisfied with your backup?

Once you answer that question affirmatively, you are ready to proceed with your plan. Such a plan should include use of snapshots for virtual machine upgrades, as well as use of secondary or tertiary storage as needed. My second-favorite question before proceeding is:

Do you have your upgrade plan in writing?

With a plan and a reason to upgrade, you are ready to proceed! Now, it is time to discuss the why of the plan. The why is almost as important as the how in many cases.

Why Upgrade?

Recently I was fortunate enough to get three of nearly the latest set of systems for my virtual environment. These systems are two generations newer. They are also on the VMware HCL, whereas the original systems were not. That alone is worth the upgrade.

The requirement of a server blade swap for blade chassis is nearly a full hardware upgrade. I needed to swap not just the blade but CPUs, memory, storage controllers, and network controllers. This is why I was fortunate to get replacement hardware. That combination is not the cheapest to find for server blades. To keep my power consumption and costs down, I went with 85W CPUs and purchased from a reputable source that allows server exchanges as part of a purchase. This allowed me to dispose of my older gear and get some dollars in return as well (not as much as the new hardware, but that is to be expected).

The Plan

With new hardware, a satisfactory backup, and a plan, I was ready to proceed. What was my plan? It was, simply:

Verify each new blade and flash the firmware to the latest levels.
Relabel all Virtual Connect networks to represent their utility and upstream external connection.
Move all virtual switches to distributed virtual switches. Yes, I still used some of the original virtual switches.
Provide redundant Virtual Connect networking. This did not go as planned due to the way the BladeSystem c3000 horizontal stacking works. (See https://windowsrunbook.blogspot.com/2015/07/smart-link-on-c3000-hp-enclosure-shared.html for a great description of the why and what would be required for such switch-level redundancy.)
Remove the storage from the older blade and place it in the new blade in the same locations: SSD and microSD.
Remove the FC HBA from the older blade and place the HBA in the new blade in the same mezzanine location.
Put the new blade into the same slot as the old blade within the c3000 Enclosure.
Power on each blade.
Ensure that ESXi starts and VSAN is still seen.
Reset the server profile to ensure all appropriate Virtual Connect networks are in use (ensure new and older networks have proper names, remove unused networks, etc.)
Upgrade ESXi to the latest version with the HPE Gen9 Plus version of HPE Tools.
Ensure all monitoring systems show green across all tools.

Following the Plan

Since I had some open slots in my c3000, I was able to perform Step 1 with no issues. The only concern was getting access to the Gen10 Service Pack Package (SPP), which requires a valid support agreement. That took a bit longer due to some internal-to-HPE issue with email. Eventually, I was able to get access and download the necessary files. Each blade was verified against what I purchased, firmware was flashed to the latest, and a burn-in test of sorts was run. I say “of sorts,” as there was no real tool to do a burn-in test, but HPE provides some great blade diagnostic tools that I used.

Step 2 was straightfoward, as the Virtual Connect tooling is simple to use and changing network label names does not require a server profile reset. A server profile reset requires a reboot. Adding or deleting networks requires a reboot.

Step 3 was pretty simple, as VMware vSphere has handy tools for migrating networks. The only one I could not migrate beforehand was my storage network. I did that one as I replaced all blades, just to be safe. Since I use iSCSI and NFS exclusively these days, I preferred to have the systems available as I replaced each blade in turn. The only gotcha was related to snapshots. I had some snapshots hanging around that showed my older virtual switches in use still, even though they did not physically exist anymore. Eliminate the snapshots, and you eliminate the inconsistent reporting.

The chassis swap, Steps 4 through 10, went without a hitch. This is where I expected trouble and where I found out the network redundancy I was expecting would not work. This is also where I discovered that you cannot provide redundancy through the virtual switch for iSCSI networks. Each iSCSI VMkernel portgroup must only be bound to one network at a time. To make it redundant requires multiple VMkernel ports, which I hadn’t known previously, as I did not fully understand the redundancy requirements of my horizontal stacked blade switches. Now I do.

Step 11 did present a small issue. I had so many older HPE updates from vibsdepot.hpe.com that they got in the way and caused collisions. The simplest thing was to reset the update manager database. I achieved this with the help of VMware KB 2147284. Once I reset the update manager database, I was able to upload the HPE-Gen9plus Dec19 Depot zip file and reference the HPE-Gen9plus Dec19 ISO as the install media in the VUM repo. This removed all the older HPE updates and the collisions they formed within VUM. However, this was unrelated to the chassis swap. This was an issue with HPE VIB collisions.

Step 12 just happened. As I finished each upgrade of each blade, VSAN came back, and all my errors disappeared naturally.

As you can see, it was a fairly painless blade swap, but without proper planning it could have been a disaster. As years of blade, storage, and other upgrades have taught me:

Always have a plan to upgrade, and include a recovery plan in case the upgrade fails!

This means I am satisfied with my backup.

Conclusion

My server blade swap worked! Now my systems are up to date to the latest supported by HPE and VMware, which is a clear win. While I scheduled a weekend for the entire effort, it took less than one day, which is another win. My level of planning allows me to spend much more time with my family. Another win!

History

Why Upgrade?

The Plan

Following the Plan

Conclusion

Leave a comment

Cancel reply