Since I adopted vSphere, I have been meaning to move to distribute virtual networking, but other things got in the way, such as my upgrade to a blade infrastructure as well as just general maintenance.Well I finally gave it a try. I have 4 basic networks, each for their own trust zone. 3 of these 4 migrated quickly and easily, but the last one was proving a bit difficult as it contained the service console of the vSphere ESX hosts as well as the administrative tools to manage the vSphere environment.
My first attempt at migrating this all important network failed horribly. I lost connectivity to everything. Here is what I did on that attempt:
- Used Manage Hosts to add each host to the new dvSwitch I created and the necessary portgroups.
- Assigned the SC to one of the dvSwitch portgroups
- Assigned NFS/iSCSI to one of the dvSwitch portgroups
- Assigned the VMs to other portgroups
The task started but got as far as assigning the SC before the systems became in accessible. Apparently this was not the appropriate method. My thought was that the dvSwitch Manage Hosts code would download to each ESX host the necessary commands to make this happen without needing anything extra.
I was sadly mistaken. I in effect lost connectivity to everything to manage the systems and once the SC lost connectivity to my isolation address, VMware HA powered off all my VMs. What a mess. To fix, I had to go back into the service console and run such commands as ‘esxcfg-vswif’ and ‘esxcfg-vswitch’ to migrate the service console back to the appropriate networks. What I found out and why this happened was that the assignment of VMs to portgroups did not happen, the SC was migrated, the dvSwitch Portgroups were created. One of those VMs to migrate was the firewall for the administrative network. While half was correct, the admin side was NOT correct.
Since I am now using HP BladeSystems, I added a second uplink to my Flex-10 interconnects to my main network switch. I think went into one ESX host and added it as a ‘CONSOLE’ network for administrative purposes. But since it was a BladeSystem I had to first power off the host, which forced a vMotion of all the VMs off the blade. Once done, and the new network was available, I assigned it to the service console leaving the original vmnic to be used for the dvSwitch.
Part of this process was to migrate all the administrative VMs back to this host with the new network. While that was happening I created the dvSwitch and all the required portgroups then using the Manage Hosts aspect of the dvSwitch added the host back to the dvSwitch and assigned the now available vmnic to it as well as the appropriate VMs. Once that was finished for all hosts, the next stage was pretty simple.
That stage was to add all the other hosts to the dvSwitch and then finally move the Service Console port back the appropriate dvSwitch. Which went off flawlessly. Now all is in order, another vMotion of the VMs from the host with the CONSOLE network so that I can remove that network.
Once this was completed, I am now fully using dvSwitches for everything but a few security VMs from vShield. I will have to reinstall the vShield components to get everything working appropriately.
However, I noticed all the vMotions were seriously slow. That is another story.
UPDATE: Had an issue with vShield causing all sorts of issues due to the VFILE SCSI filter not acting properly (yes this is part of vShield Endpoint). vMotion’s were taking forever and stalling out, so wanted to remove that, but DRS came into play and caused VMs to be sent all over my environment. Long and short of it was that the host holding vCenter was reset, this caused HA to fail, as well a manual reboot of all the nodes. Apparently, when a VM loads its dvSwitch port number needs to be selected (it is in the configuration file) but since the VM cannot communicate to vCenter, the VMs could NOT boot.
The solution was to temporarily put the service console and administrative tools (such as vCenter) onto a regular vSwitch. Then once vCenter was booted, properly bring all the VMs backup and running.
UPDATE II: Happened again. So this time I created an administrative portgroup on the VMware vSwitch I left and transfered to that vSwitch: vCenter, Service Consoles, Active Directory for management, and a few other critical bits. Then set the boot order to be those items on this vSwitch to be FIRST within the cluster. Once vCenter is available then all else should be fine.