ESXi issues connected with storage.

By : La
November 12, 2020
Category : VMware

ESXi issues connected with storage.

There is a lot articles over the internet with information or questions regarding the storage manage by ESXi. No wonder, as storage issue is the most serious one and for many VMware administrators hard to debug. Problems can be connected with storage high latency as well as with general connectivity problems.

VMware has introduced several mechanism to track the storage status and as the result update the log files.

Please take a look closer to article: https://kb.vmware.com/s/article/2113956

As we can see in this document (down the site) there are several entries admin can look for.

First of all check vobd.log for entry like:

Lost access to volume <uuid><volume name> due to connectivity issues. Recovery attempt is in progress and the outcome will be reported shortly

We need to understand, that when volume is in the lost access state, host will have no I/O read/writes as long as heartbeat I/O can be completed.

For that reason, usually in vmkernel.log the following entries can be seen:
HB at offset XXXX – Reclaimed heartbeat [Timeout]:

And also in vobd.log:

entries like this: [Timeout] [HB state

The reclaim should be very frequently and occurs every second.

Of course, during storage problem, virtual systems cannot write to this storage too. Virtual system will try to be online as long as they can sustain the huge latency or at the end (if the problem remains) disk inaccessibility. It really depends from the system how it behaves when lost access to the disk, usually windows is going to end up with blue screen and linux will run but very unstable and at the end the only thing you can do is to reboot it.

In this scenario (host is disconnected from datastore too long – >5sec) you should see in logs the following:

NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.XXXX" state in doubt; requested fast path state update...

This is because of timeouts and HBA drivers aborts commands.

In kb: https://kb.vmware.com/s/article/1022026 explanation:

Array backup operations (LUN backup, replication, etc)
General overload on the array
Read/Write Cache on the array (misconfiguration, lack of cache, etc)
Incorrect tiered storage used (SATA over SCSI)
Fabric issues (Bad ISL, outdated firmware, bad fabric cable/GBIC)

One last thing is worth mentioning is great blog with scsi code decoder: https://www.virten.net/vmware/esxi-scsi-sense-code-decoder

So you can examine log similar to the: Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE

In conclusions, problems on storage systems can be very dangerous. With hypervisor we have additional layer which need to know what is currently happening in infrastructure (storage, SAN issues) and react accordingly. We do not want to guest systems stop in any of such issues. But from the other side, system should be aware of such issue, so it can stop write to disk (which are currently not visible to hypervisor) This kind of issues are usually hard to debug for system admin. Unfortunately, from

https://kb.vmware.com/s/article/1022026

https://kb.vmware.com/s/article/2113956

VMware ESXi SCSI Sense Code Decoder

No Comments

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SIMILAR NEWS

Free(BSD)

Linux text manipulation

This article will grow over time Definition Command 1 Remove all comments including empty line (comments with ; and # like in samba.conf) egrep -v ‘^[[:space:]]*$|^ *#|^ *;’ /etc/samba/smb.conf 2 Find all file with some extension and remove it one by one; works for gnu xargs; 0 argument to properly …

Linux

Migrate WordPress site to another hosting service.

IntroductionThis article details the migration of WordPress site (exactly this site you are now on) from one service provider to Amazon Lightsail. There might be various reason to do that (mine is outlined below) but in general I hope to share the message that especially with WordPress, migration can be …

VMware

VMware Workstation and Fusion can be installed and use for free (even for the enterprise)

For a while now, the VMware Workstation (and Fusion for MacOS) can be used without any additional fee for Personal use. That was a great Broadcom news and nice gesture from that software vendor. Recently Broadcom announced that the software will be available for all, even the commercial sector. This …