IBM DS3400 Redundant Controllers and Bad Batteries, eNet Cable Fail

Recently my IBM DS3400 SAN gave an alert that the controller batteries had to be changed out. So after ordering some batteries, receiving them, it was time to perform a battery exchange. The steps are quite straight forward but still require a bit of forethought. I run IBM System Storage Manager 10 from within a VM running Windows 2008 R2, it is actually my VMware vCenter Server. In order, for me to exchange the batteries the IBM System Storage Manager 10 must be able to talk to the controllers either over the network or over the fibre connection. Since this is a VM, all I can do is control the SAN over the network at this time.

The first controller went with out a hitch. I had to first identify the controller(s) which went fairly easily and then I evacuated all the LUNs from that controller to the other controller. This was simply down by changing the preferred path for the LUN. So for Controller A, I set all the LUNs using Controller A to have a preferred path through Controller B. Then I was able to Place Controller A offline.

To determine which controller was controller A was simply a case of looking for which controller was ‘dark’ I.e. no Ethernet or FC traffic. Pulling the controller and replacing the battery was pretty straightforward once this was determined. With the new battery in place the controller was once more inserted. Then Controller A was brought back online. No issues! No Downtime whatsoever!

The process follows for Controller B

Modify all LUNs so that they had a preferred path of Controller A
Bring Controller B offline
Extract the Controller
Exchange the Batteries
Reinsert Controller B
Bring Controller B online
Reset the battery count down to 0 days.

Unfortunately I had a problem at the second step of this process. Controller B did not pick up the DHCP address assigned to it, in fact it referenced a completely separate subnet. I thought there was a controller issue and went through many convolutions and attempts to get the proper IP and subnet. I went so far as to set the VM hosting the IBM System Storage DS Manager to be within the subnet of Controller B and this still did not work. The cables all showed light but no activity. As a last ditch I switched out the cable and viola it worked. The DHCP address was picked up and I was finally able to proceed with the process once more upgrading the battery with zero downtime.

Even though all this worked, I discovered something interesting. I still need direct FC access to upgrade the firmware as well as sync the onboard clocks. Which bothers me somewhat as at the moment there is no way to see the FC device from within the VM.

I also discovered how to make SAN volumes with Virtual LUNs which will help me later when I redo the SAN to increase the spindles per SAN volume and therefore per LUN.

Key Take away:

Redundant Controllers help with simple updates
Always check your network cables when there is an issue

Cancel reply

Join the Conversation

6 Comments

Rahul says:

November 11, 2011 at 1:23 am

Hi

Just wanted to know what exactly is the use of battery on controller and what problems we may face if we failed to replace faulty batteries.

1. Edward Haletky says:
  
  November 30, 2011 at 3:44 pm
  
  Hello,
  
  THis was a combination of failures, but if a battery fails and you loose power you could loose some acceleration of the array controller, and that would not be great. THe battery is for battery backed memory to speed up reads, etc.
  
  Best regards,
  Edward L. Haletky
  
2. Fred says:
  
  January 23, 2015 at 7:02 am
  
  The battery is a “keep alive” function and continues to keep the data in the CACHE alive during a power glitch. If the battery fails the cache turns itself off and goes into direct read/write mode. This is to prevent corruption of the data in the cache which then corrupts the entire array, which is extremely devastating. When the battery is replaced and cache is re-activated the system returns to normal and there is a gain of about 30 to 50% in through put. RAID batteries are very important.
  
adrian says:

December 13, 2011 at 1:30 am

what happen if i dont replace DS3400 expired cache memory battery with a new one and just reset its age?

1. Edward Haletky says:
  
  December 21, 2011 at 8:20 am
  
  Hello Adrian,
  
  Not sure as I never tried that but considered it. I imagine that for one round that would be fine, but otherwise it would be a more permanent failure.
  
  — Edward
  
2. Fred says:
  
  January 23, 2015 at 7:15 am
  
  I am a field engineer with a large IT support company. our practice on IBM DS systems is to reset the date and time expirations forward until the battery actually fails. Then we replace it. IF the error code returned in your diagnostics ends in an 8, the battery is dead, if the error code ends in a 9 the battery expiration can be reset. These are NICAD batteries, and the expiration can be reset. The main reason this is done is because the batteries are not changed more often is because of they are not environmentally friendly and special handling is required for disposal. additionally, the process for replacement is not easy and data loss is a real danger. Keep in mind that messing around with these arrays is like trying to hand write a bar code label with a sharpie, on the belly of a rattle snake. don’t do it unless you are forced to and then be extremely careful.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Join the Conversation

Leave a comment

Cancel reply