When running some storage test on an ESX 3.5 environment lately, I ran into a problem where a dead storage path disappeared after a rescan. While this was as expected, the storage path didn’t return after fixing it in the SAN and running a rescan again. After a reboot of the ESX server, the new paths were available again. When digging in deeper it turned out that whenever we changed the zoning on the SAN so that a new storage target came available to the ESX server, that target came available only after a reboot of the ESX server.
After some research I stumbled upon this VMware KB article: Configuring fibre switch so that ESX Server doesn’t require a reboot after a zone set change. This article described exactly the problem I encountered and explained that this was the result of RSCN events being suppressed by the SAN switches. Investigation revealed indeed that RSCN events were being suppressed by the SAN switches, as seen in the screenshot below.
To resolve this issue the following solution was given by the kb article:
To enable RSCN events, configure the Switch Operating Parameters so that the Suppress Zoning RSCN on Zone Set Activations is disabled.
As the solution was simple, just uncheck the option, I didn’t want to change this option right away as I was unaware of the impact on the SAN fabric. It also was not clear to me why that option was activated in the SAN design anyway.
When analyzing the differences between a regular rescan and a reboot, we came across the HBA login process. What if we could force a HBA login? First thing we came up with was to disconnect the fiber from the ESX server, but that was an impractical approach. Next thing was trying to disable/enable the SAN switch port to which an ESX server HBA was connected. While monitoring the /var/log/vmkernel log, we noticed the following message in the /var/log/vmkernel log when we enabled the SAN switch port: “Waiting for LIP to complete”. That’s an odd message because that message is normally seen only when using the arbitrated loop topology while we were using point-to-point topology, but at least it indicated that the HBA was trying to login. After this performing a rescan revealed the new storage target to us.
The reason why we saw the “Waiting for LIP to complete” message was probably because of the default setting of the connection type (2 – loop preferred, then point-to-point) in the QLogic HBA BIOS. With this default setting the QLogic HBA will try the arbitrated loop protocol first, hence the aforementioned message in the/var/log/vmkernel log.
I was still not satisfied with this solution, because I would need a SAN storage administrator and I had to investigate the SAN switch to find the port the ESX server’s HBA was connected to. This method was also very error prone, because disabling the wrong port would cause a disruption to another system. So I opted for another option, which was finding a way to force a HBA to login to the SAN fabric from within the ESX server console. When discussing my problem on Twitter I received some tips from Yellow-Bricks‘ Duncan Epping. He pointed me to a paragraph of the QLogic Fibre Channel Driver ReadMe.
9.7 How to Force a LIP
----------------------
The following NVRAM parameters must be set in order to perform the
LIP reset:
o Enable Lip Reset
o Enable Target Reset -- if the attached targets should be
also be reset
If both the above parameters are disabled, a Full Login LIP is
executed.
Execute the following command to initiate the LIP reset process:
# echo "scsi-qlalip " > /proc/scsi/qla2300/<host_no>
Although we weren’t using the arbitrated loop topology, this was definitely something to try. When executing the aforementioned echo command, we saw the same behavior in the /var/log/vmkernel log as when we disabled/enabled the SAN switch port. After this the only thing left to do was performing a rescan of the HBA. This solution enabled me to reveal new SAN storage targets without an ESX server reboot.
The QLogic adapter BIOS settings in this environment were not set according to the Driver ReadMe above. We tested this without changing the adapter BIOS settings and the table below shows the settings at the time of testing, which are the QLogic default.
| Enable LIP Reset | No |
| Enable LIP Full Login | Yes |
| Enable Target Reset | Yes |
Summary
To resolve the problem, we need to deactivate the RSCN suppressing on the SAN switches, as describes by the VMware kb article.
Until than we have a workaround using the following ESX server console commands:
echo "scsi-qlalip" > /proc/scsi/qla2300/<host_no>
esxcfg-rescan vmhba<host_no>
where <host_no> must be replace with the vmhba number.
I don’t know if this method is supported or is causing disruptions on your vms, so test this thoroughly if you want to use these commands on your environment. I also don’t know if these commands still work if you force the connection topology to point-to-point only in the QLogic HBA BIOS.
Related posts:
- Unable to login to your ESX server Ivo Beerens posted this article last week on the defunct cimservera processes that render an ESX Host unmanageable. See also this VMWare KB Article. Symptoms include: Unable to log...
- List HBA WWPNs and LUNs using Powershell Lately I’m moving around my VMs and storage luns between my ESX clusters a lot to accomplish a complete redesign of my Virtual Infrastructure. At some point I got...
- VMware Storage Sudoku Last Friday I was brainstorming with Gabrie van Zanten about the optimal placement of the VMDKs across our LUNs. We tried to come up with an algorithm that could...
- Geographically dispersed cluster design Nowadays more and more companies have or are considering a 2nd datacenter in another site. Mostly the main reason for this 2nd datacenter is for disaster recovery purposes. There...
- Unattended upgrade of HP management agents After upgrading to ESX 3.5 to update3, I found out that the HP management agents needed to be upgraded to version 8.1.1, since this version supports ESX3.5 update3. So...

0 Comments on “How to force a login on a QLogic HBA”
Leave a Comment