vSphere Upgrade Saga: Pre-Upgrade Crash

My last 6.0 patch upgrade had an interesting phenomenon. Staging of three of the patches worked without a hitch. Before I could install the next patch, however, I had to cold-reboot the nodes. A soft reboot caused a red message to the console complaining about a module that would not load: a module I did not recognize, and now cannot remember. However, when this happened I did a cold reboot, and everything looks like it worked.

Could this have been due to a peculiarity with vSphere 6.0 U1? Or did it have something to do with the iSCSI issues I had before? Or with the starting and stopping of VSAN in conjunction with the previous iSCSI issues? Logs seem to point toward iSCSI.

In actuality, it was not connected. There is a known problem related to the bootblocks’ seeming to become corrupted. This happened on all my nodes during the U1 upgrade. The following appeared during the reboot:

vmware_f.v00: file not found

Now I was worried, as the solution KB articles suggested was a reinstall. Others recommended using fsck on the bootbanks within ESXi. Yet, all of them stated that there was a corruption within the bootbanks. There are two bootbanks per partition: the primary and the alternative. If one is corrupt, the other should take its place. A simple reboot will fix the issue as well.

That still means, however, that you need to work through the problem in the alternative bootbank. I fixed the immediate problem by just rebooting the node after a complete shutdown of the hardware. From power-off, a power-on caused ESXi to boot the proper bootbank, and away it ran.

A VUM remediation I had run previously seemed to have fixed the alternative bootbank as well. However, it apparently was not fixed, as when I went to upgrade to v6.0 U2, the same problem happened again. This time, it said I was out of space and could not remediate the node. Once more, there were three possible fixes:

Reboot the node and try again (this failed)
File system integrity (fsck) method
Reinstall

For the first node, I reinstalled, as I did not know about the fsck method. However, once I read KB 2033564, I went to run dosfsck on each of the nodes, and they no longer exhibited the problem.

dosfsck does a file system integrity check on the bootbank and fixes any issues that may be there. This approach saves time over a reinstall.

Once the file system integrity check was completed, remediation on all nodes happened as expected. We were safely able to upgrade all nodes to U2 using VMware Upgrade Manager. Now all that’s left is to upgrade VMware Tools, virtual hardware, and all the VMware virtual appliances: the standard set of post-upgrade tasks.

Leave a comment

Cancel reply