How to force a login on a QLogic HBA

When running some storage test on an ESX 3.5 environment lately, I ran into a problem where a dead storage path disappeared after a rescan. While this was as expected, the storage path didn’t return after fixing it in the SAN and running a rescan again. After a reboot of the ESX server, the new paths were available again. When digging in deeper it turned out that whenever we changed the zoning on the SAN so that a new storage target came available to the ESX server, that target came available only after a reboot of the ESX server.

After some research I stumbled upon this VMware KB article: Configuring fibre switch so that ESX Server doesn’t require a reboot after a zone set change. This article described exactly the problem I encountered and explained that this was the result of RSCN events being suppressed by the SAN switches. Investigation revealed indeed that RSCN events were being suppressed by the SAN switches, as seen in the screenshot below.

To resolve this issue the following solution was given by the kb article:

To enable RSCN events, configure the Switch Operating Parameters so that the Suppress Zoning RSCN on Zone Set Activations is disabled.

 

As the solution was simple, just uncheck the option, I didn’t want to change this option right away as I was unaware of the impact on the SAN fabric. It also was not clear to me why that option was activated in the SAN design anyway.

When analyzing the differences between a regular rescan and a reboot, we came across the HBA login process. What if we could force a HBA login? First thing we came up with was to disconnect the fiber from the ESX server, but that was an impractical approach. Next thing was trying to disable/enable the SAN switch port to which an ESX server HBA was connected. While monitoring the /var/log/vmkernel log, we noticed the following message in the /var/log/vmkernel log when we enabled the SAN switch port: “Waiting for LIP to complete”. That’s an odd message because that message is normally seen only when using the arbitrated loop topology while we were using point-to-point topology, but at least it indicated that the HBA was trying to login. After this performing a rescan revealed the new storage target to us.

The reason why we saw the “Waiting for LIP to complete” message was probably because of the default setting of the connection type (2 – loop preferred, then point-to-point) in the QLogic HBA BIOS. With this default setting the QLogic HBA will try the arbitrated loop protocol first, hence the aforementioned message in the/var/log/vmkernel log.

I was still not satisfied with this solution, because I would need a SAN storage administrator and I had to investigate the SAN switch to find the port the ESX server’s HBA was connected to. This method was also very error prone, because disabling the wrong port would cause a disruption to another system. So I opted for another option, which was finding a way to force a HBA to login to the SAN fabric from within the ESX server console. When discussing my problem on Twitter I received some tips from Yellow-BricksDuncan Epping. He pointed me to a paragraph of the QLogic Fibre Channel Driver ReadMe.

9.7 How to Force a LIP
----------------------

The following NVRAM parameters must be set in order to perform the
LIP reset:

    o Enable Lip Reset
    o Enable Target Reset -- if the attached targets should be
                       also be reset

If both the above parameters are disabled, a Full Login LIP is
executed.

Execute the following command to initiate the LIP reset process:

    # echo "scsi-qlalip " > /proc/scsi/qla2300/<host_no>

Although we weren’t using the arbitrated loop topology, this was definitely something to try. When executing the aforementioned echo command, we saw the same behavior in the /var/log/vmkernel log as when we disabled/enabled the SAN switch port. After this the only thing left to do was performing a rescan of the HBA. This solution enabled me to reveal new SAN storage targets without an ESX server reboot.

The QLogic adapter BIOS settings in this environment were not set according to the Driver ReadMe above.  We tested this without changing the adapter BIOS settings and the table below shows the settings at the time of testing, which are the QLogic default.

Enable LIP Reset No
Enable LIP Full Login Yes
Enable Target Reset Yes

Summary

To resolve the problem, we need to deactivate the RSCN suppressing on the SAN switches, as describes by the VMware kb article.
Until than we have a workaround using the following ESX server console commands:

echo "scsi-qlalip" > /proc/scsi/qla2300/<host_no>
esxcfg-rescan vmhba<host_no>

where <host_no> must be replace with the vmhba number.

I don’t know if this method is supported or is causing disruptions on your vms, so test this thoroughly if you want to use these commands on your environment. I also don’t know if these commands still work if you force the connection topology to point-to-point only in the QLogic HBA BIOS.

Related posts:

  1. Unable to login to your ESX server Tweet Ivo Beerens posted this article last week on the defunct cimservera processes that render an ESX Host unmanageable. See also this VMWare KB Article. Symptoms include: Unable to log...
  2. Unable to login to your ESX server Tweet Ivo Beerens posted this article last week on the defunct cimservera processes that render an ESX Host unmanageable. See also this VMWare KB Article. Symptoms include: Unable to log...
  3. List HBA WWPNs and LUNs using Powershell Tweet Lately I’m moving around my VMs and storage luns between my ESX clusters a lot to accomplish a complete redesign of my Virtual Infrastructure. At some point I got...
  4. List HBA WWPNs and LUNs using Powershell Tweet Lately I’m moving around my VMs and storage luns between my ESX clusters a lot to accomplish a complete redesign of my Virtual Infrastructure. At some point I got...
  5. VMware Storage Sudoku Tweet Last Friday I was brainstorming with Gabrie van Zanten about the optimal placement of the VMDKs across our LUNs. We tried to come up with an algorithm that could...

7 Comments on “How to force a login on a QLogic HBA”

  1. #1 ESX; force HBA to relogin to SAN Fabric | CameronKennedy.info
    on May 3rd, 2011 at 3:17 pm

    [...] detailed information here: http://www.van-lieshout.com/2010/01/how-to-force-a-login-on-a-qlogic-hba/ VMware KB article: [...]

  2. #2 Christoffer Zettermark
    on Dec 2nd, 2011 at 12:58 pm

    Any tips on performing your workaround (echo “scsi-qlalip” …) on ESXi?

  3. #3 Arnim van Lieshout
    on Dec 8th, 2011 at 9:44 am

    Just make sure that RSCN event are enabled on your switch fabric. This way you won’t be needing the work-around.
    See also VMware KB Article 1002301

  4. #4 Michael
    on Jan 10th, 2012 at 4:50 pm

    This command does not seem to work with ESXi 4.1U1. the path is slightly different (/proc/scsi/qla2xxx/) and what I get back is:

    # echo “scsi-qlalip” > /proc/scsi/qla2xxx/ vmhba1
    -ash: cannot create /proc/scsi/qla2xxx/: Is a directory

    Any advise would be creatly appreciated.

  5. #5 Michael
    on Jan 10th, 2012 at 6:02 pm

    Had a word with VMware Support. This does work with ESXi, I just had the wrong at the end of the command – should be:

    echo “scsi-qlalip” > /proc/scsi/qla2xxx/3

    ‘3′ being the file representing the inoperable HBA. When we CAT the file, it reported that all target ports were offline, so we knew we were running the command against the correct HBA.

    FC Target-Port List:
    scsi-qla1-target-0=500507680130257f:0e0303:1000:;
    scsi-qla1-target-1=5005076801302466:0e0305:1000:;
    scsi-qla1-target-2=5005076801304a4a:0e030b:1000:;
    scsi-qla1-target-3=500507680130480a:0e030d:1000:;

    Also, ESXi 4.1U1 automatically initiated a full rescan – all of my datastores are back online for vmhba1 and I didn’t have to reboot!

    THANKS!!

  6. #6 Michael
    on Jan 10th, 2012 at 6:02 pm

    FC Target-Port List:
    scsi-qla1-target-0=500507680130257f:0e0303:1000:;
    scsi-qla1-target-1=5005076801302466:0e0305:1000:;
    scsi-qla1-target-2=5005076801304a4a:0e030b:1000:;
    scsi-qla1-target-3=500507680130480a:0e030d:1000:;

  7. #7 Arnim van Lieshout
    on Jan 30th, 2012 at 11:06 pm

    Thanks for sharing Michael. I actually never tried this on ESXi as we only had the classic ESX available at that time.

Leave a Comment