IBM DS3400 Redundant Controllers and Bad Batteries, eNet Cable Fail

Recently my IBM DS3400 SAN gave an alert that the controller batteries had to be changed out. So after ordering some batteries, receiving them, it was time to perform a battery exchange.   The steps are quite straight forward but still require a bit of forethought. I run  IBM System Storage Manager 10 from within a VM running Windows 2008 R2, it is actually my VMware vCenter Server. In order, for me to exchange the batteries the IBM System Storage Manager 10 must be able to talk to the controllers either over the network or over the fibre connection. Since this is a VM, all I can do is control the SAN over the network at this time.

The first controller went with out a hitch. I had to first identify the controller(s) which went fairly easily and then I evacuated all the LUNs from that controller to the other controller. This was simply down by changing the preferred path for the LUN. So for Controller A, I set all the LUNs using Controller A to have a preferred path through Controller B. Then I was able to Place Controller A offline.

To determine which controller was controller A was simply a case of looking for which controller was ‘dark’ I.e. no Ethernet or FC traffic. Pulling the controller and replacing the battery was pretty straightforward once this was determined. With the new battery in place the controller was once more inserted. Then Controller A was brought back online. No issues! No Downtime whatsoever!

The process follows for Controller B

  1. Modify all LUNs so that they had a preferred path of Controller A
  2. Bring Controller B offline
  3. Extract the Controller
  4. Exchange the Batteries
  5. Reinsert Controller B
  6. Bring Controller B online
  7. Reset the battery count down to 0 days.

Unfortunately I had a problem at the second step of this process. Controller B did not pick up the DHCP address assigned to it, in fact it referenced a completely separate subnet. I thought there was a controller issue and went through many convolutions and attempts to get the proper IP and subnet. I went so far as to set the VM hosting the IBM System Storage DS Manager to be within the subnet of Controller B and this still did not work. The cables all showed light but no activity. As a last ditch I switched out the cable and viola it worked. The DHCP address was picked up and I was finally able to proceed with the process once more upgrading the battery with zero downtime.

Even though all this worked, I discovered something interesting. I still need direct FC access to upgrade the firmware as well as sync the onboard clocks. Which bothers me somewhat as at the moment there is no way to see the FC device from within the VM.

I also discovered how to make SAN volumes with Virtual LUNs which will help me later when I redo the SAN to increase the spindles per SAN volume and therefore per LUN.

Key Take away:

  • Redundant Controllers help with simple updates
  • Always check your network cables when there is an issue
Edward Haletky

Edward L. Haletky, aka Texiwill, is an author, analyst, developer, technologist, and business owner. Edward owns AstroArch Consulting, Inc., providing virtualization, security, network consulting and development and TVP Strategy where he is also an Analyst. Edward is the Moderator and Host of the Virtualization Security Podcast as well as a guru and moderator for the VMware Communities Forums, providing answers to security and configuration questions. Edward is working on new books on Virtualization.

6 thoughts on “IBM DS3400 Redundant Controllers and Bad Batteries, eNet Cable Fail”

  1. Hi

    Just wanted to know what exactly is the use of battery on controller and what problems we may face if we failed to replace faulty batteries.

    1. Hello,

      THis was a combination of failures, but if a battery fails and you loose power you could loose some acceleration of the array controller, and that would not be great. THe battery is for battery backed memory to speed up reads, etc.

      Best regards,
      Edward L. Haletky

    2. The battery is a “keep alive” function and continues to keep the data in the CACHE alive during a power glitch. If the battery fails the cache turns itself off and goes into direct read/write mode. This is to prevent corruption of the data in the cache which then corrupts the entire array, which is extremely devastating. When the battery is replaced and cache is re-activated the system returns to normal and there is a gain of about 30 to 50% in through put. RAID batteries are very important.

    1. Hello Adrian,

      Not sure as I never tried that but considered it. I imagine that for one round that would be fine, but otherwise it would be a more permanent failure.

      — Edward

    2. I am a field engineer with a large IT support company. our practice on IBM DS systems is to reset the date and time expirations forward until the battery actually fails. Then we replace it. IF the error code returned in your diagnostics ends in an 8, the battery is dead, if the error code ends in a 9 the battery expiration can be reset. These are NICAD batteries, and the expiration can be reset. The main reason this is done is because the batteries are not changed more often is because of they are not environmentally friendly and special handling is required for disposal. additionally, the process for replacement is not easy and data loss is a real danger. Keep in mind that messing around with these arrays is like trying to hand write a bar code label with a sharpie, on the belly of a rattle snake. don’t do it unless you are forced to and then be extremely careful.

Leave a Reply

Your email address will not be published. Required fields are marked *

14 − 11 =