Unable to login to your ESX server

Ivo Beerens posted this article last week on the defunct cimservera processes that render an ESX Host unmanageable.
See also this VMWare KB Article.

Symptoms include:

  • Unable to log in through SSH to the ESX host.
  • Unable to log in on the local service console.
  • HA errors.

I ran into this problem a while ago too.
The reason why you cannot login to your ESX server anymore is the fact that the COS ran out of PIDs. (32k processes max?). This is what gets you the fork: error messages on the server console.
I solved this by removing some VMs from the affected ESX host (VMotion and/or Shutdown) and keep on trying logging in to the console (through HP iLO or other management processor) until I succeeded.
The trick is to restart the pegasus service as suggested by the VMware KB Article, but you probably won’t succeed because of low resources.
You need to free up some resources in the COS.

You can either move some more VMs to free up resources or use this procedure to kill some of the defunct cimservera processes:

# ps -ef | grep cimservera

Find out the PIDs and kill them using:

# kill -9 <pid>

Just kill enough processes until you can restart the pegasus service, which will clear them all.
I know it’s a hell off a job, but at least you won’t have to restart the complete ESX host and shutdown VMs.

If you still have problems connecting the host in vCenter, restart the management services:

# service mgmt-vmware restart

Now that we solved the problem so far, how do we keep this from happening in the future.
As you watch your ESX host frequently (ps -ef or use top and watch for zombie processes) you’ll see that the defunct processes return.
I found out that in my case this was caused by HP SIM. In the WBEM configuration screen there were users configured not known by ESX.
This was revealed by the /var/log/messages showing failed logins.
Since I wasn’t using WBEM anyway I decided to shutdown the CIM firewall port.

# /usr/bin/esxcfg-firewall -d CIMHttpsServer

If you shutdown this port, HP SIM won’t detect WBEM.
After closing the port I did restart the pegasus service again to clear the new defunct processes and did an Identify on HP SIM on the ESX Host.

The problem never returned.

Related posts:

  1. Monitor ESX datastores using Hostmonitor Tweet A lot of customers I worked with use Advanced Hostmonitor from KS-Soft to monitor their Windows environment. So is the customer I  am currently working at. Advanced Hostmonitor is...

0 Comments on “Unable to login to your ESX server”

Leave a Comment