Unable to connect to the MKS: A general system error occured: Internal error

All 16 hosts in cluster are up and running since long time without any issue – uptime 300+ days. On all hosts we cannot get access to VM console. Opening VM console from viClient we get error “Unable to connect to the MKS: A general system error occured: Internal error“.
We cannot vmotion VMs to another esxi hosts in cluster. 

《Unable to connect to the MKS: A general system error occured: Internal error》


We login into esxi hosts and noticed that root Ramdisk is full:

# vdf -h | tail -6 

《Unable to connect to the MKS: A general system error occured: Internal error》

The uptime of esxi hosts was impressive:

# uptime 

《Unable to connect to the MKS: A general system error occured: Internal error》

When we tried to get information about Virtual Machines using vim-cmd command we got error:

# vim-cmd vmsvc/getallvms 

《Unable to connect to the MKS: A general system error occured: Internal error》



We tried to figure out what consumed space on root in Ramdisk, we run command:

# find / -size +10k -exec du -h {} \; | egrep -v volumes | egrep -v disks  | less

I spotted a lot of EMCProvider logs in /opt/emc/cim/log

# ls -l | head -5

《Unable to connect to the MKS: A general system error occured: Internal error》


And bingo! these logs eat the space:

# du -h /opt/emc/cim/log/

《Unable to connect to the MKS: A general system error occured: Internal error》

It seems that EMCProvider logs haven’t rotated and fulfilled root in Ramdisk. I couldn’t find any parameter in conf file to setup rotation of EMCProvider logs – it is more feature than bug ;)

We deleted logs older than 200 days (eventually we deleted all EMCProvider logs older than 1 day) on esxi hosts in cluster:

# cd /opt/emc/cim/log/
# find . -name ‘*.log’ -mtime +200 -exec rm -f {} \;

We got some free space on root and were able to got access to some VM console, but some VMs started to show another error ‘Unable to connect to the MKS: Failed to connect to server fqdn.com:902‘:

《Unable to connect to the MKS: A general system error occured: Internal error》

We identified that VMs located on 3 esxi hosts encounter the error above.

We noticed that on affected esxi hosts nothing is listen on port 902 even when we already had enough free space on root ramdisk:

# esxcli network ip connection | grep :902

《Unable to connect to the MKS: A general system error occured: Internal error》

 
 VMs which no longer encountered issue with VM console access were located on esxi hosts where ‘busybox’ listened on port 902:

《Unable to connect to the MKS: A general system error occured: Internal error》

 
 We decided to put affected esxi hosts into MM (Maintenance Mode) and reboot. After esxi host reboot ‘busybox’ started to listen on port 902 and VM console issue gone.

The main take-away is that full root ramdisk condition is abnormal – we have to remember that in *nix world everything is a file it could explain why some hosts cannot create TCP socket for 902 port when root was full even after we got some free space on root ramdisk.

Here all steps in one printscreen:

《Unable to connect to the MKS: A general system error occured: Internal error》

 The End.

本文转自学海无涯博客51CTO博客,原文链接http://blog.51cto.com/549687/1842397如需转载请自行联系原作者


520feng2007

点赞