vSphere Upgrade – Moving to dvNetworking Take 2? Update 2….

Since I adopted vSphere, I have been meaning to move to distribute virtual networking, but other things got in the way, such as my upgrade to a blade infrastructure as well as just general maintenance.Well I finally gave it a try. I have 4 basic networks, each for their own trust zone.  3 of these 4 migrated quickly and easily, but the last one was proving a bit difficult as it contained the service console of the vSphere ESX hosts as well as the administrative tools to manage the vSphere environment.

My first attempt at migrating this all important network failed horribly. I lost connectivity to everything. Here is what I did on that attempt:

  • Used Manage Hosts to add each host to the new dvSwitch I created and the necessary portgroups.
  • Assigned the SC to one of the dvSwitch portgroups
  • Assigned NFS/iSCSI to one of the dvSwitch portgroups
  • Assigned the VMs to other portgroups

The task started but got as far as assigning the SC before the systems became in accessible. Apparently this was not the appropriate method. My thought was that the dvSwitch Manage Hosts code would download to each ESX host the necessary commands to make this happen without needing anything extra.

I was sadly mistaken. I in effect lost connectivity to everything to manage the systems and once the SC lost connectivity to my isolation address, VMware HA powered off all my VMs. What a mess. To fix, I had to go back into the service console and run such commands as ‘esxcfg-vswif’ and ‘esxcfg-vswitch’ to migrate the service console back to the appropriate networks. What I found out and why this happened was that the assignment of VMs to portgroups did not happen, the SC was migrated, the dvSwitch Portgroups were created.  One of those VMs to migrate was the firewall for the administrative network. While half was correct, the admin side was NOT correct.

Take 2

Since I am now using HP BladeSystems, I added a second uplink to my Flex-10 interconnects to my main network switch. I think went into one ESX host and added it as a ‘CONSOLE’ network for administrative purposes. But since it was a BladeSystem I had to first power off the host, which forced a vMotion of all the VMs off the blade. Once done, and the new network was available, I assigned it to the service console leaving the original vmnic to be used for the dvSwitch.

Part of this process was to migrate all the administrative VMs back to this host with the new network. While that was happening I created the dvSwitch and all the required portgroups then using the Manage Hosts aspect of the dvSwitch added the host back to the dvSwitch and assigned the now available vmnic to it as well as the appropriate VMs. Once that was finished for all hosts, the next stage was pretty simple.

That stage was to add all the other hosts to the dvSwitch and then finally move the Service Console port back the appropriate dvSwitch. Which went off flawlessly. Now all is in order, another vMotion of the VMs from the host with the CONSOLE network so that I can remove that network.

Once this was completed, I am now fully using dvSwitches for everything but a few security VMs from vShield. I will have to reinstall the vShield components to get everything working appropriately.

However, I noticed all the vMotions were seriously slow. That is another story.

UPDATE: Had an issue with vShield causing all sorts of issues due to the VFILE SCSI filter not acting properly (yes this is part of vShield Endpoint). vMotion’s were taking forever and stalling out, so wanted to remove that, but DRS came into play and caused VMs to be sent all over my environment. Long and short of it was that the host holding vCenter was reset, this caused HA to fail, as well a manual reboot of all the nodes. Apparently, when a VM loads its dvSwitch port number needs to be selected (it is in the configuration file) but since the VM cannot communicate to vCenter, the VMs could NOT boot.

The solution was to temporarily put the service console and administrative tools (such as vCenter) onto a regular vSwitch. Then once vCenter was booted, properly bring all the VMs backup and running.

UPDATE II: Happened again. So this time I created an administrative portgroup on the VMware vSwitch I left and transfered to that vSwitch: vCenter, Service Consoles, Active Directory for management, and a few other critical bits. Then set the boot order to be those items on this vSwitch to be FIRST within the cluster. Once vCenter is available then all else should be fine.

Join the Conversation

  1. Edward Haletky

2 Comments

  1. Ed,
    This sounds similar to a problem experienced on my last project. vDS was abandoned because a permanent solution could not be determined. I believe that the problem first occurred during a datacenter wide power outage which caused all hosts to be brought down. vCenter was installed to a VM. When the hosts were brought back online all VMs were no longer members of their originally configured port groups (in this configuration our Senior Architect had already mandated that the SC be on a local vSwitch, avoiding some of the headaches you originally experienced). I had tried to remedy this in our lab by reintroducing the problem and trying different approaches: Taking vCenter out of DRS, keeping it on a single host and setting the boot order on that host was one that met with some success. However I could not keep the boot order from being disabled. I suspected it was due to the host being in a cluster with HA enabled. Another attempt was to set only vCenter to the highest priority in HA. In the end, I figured that installing vCenter on bare metal would probably have the highest likely hood for success, allowing it to boot prior to any of the hosts, but did not have time to try this before moving on to my next project. I have always enjoyed the benefits of putting vCenter on a VM but in this scenario I believe that may be leaving the hosts/vms vulnerable to this problem. Thoughts?

    1. The solution I came up with is to put vCenter within a VM on a single traditional vSwitch. I setup HA to boot this VM first on failure. Once that happens the vDS’ are available for use for other VMs. This one traditional vSwitch also has the service console portgroup and a few other ‘must haves’ for administration. Seems to work quite well.

Leave a comment

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.