ESXi issues connected with storage.

  • By :
  • Category : VMware

ESXi issues connected with storage.

There is a lot articles over the internet with information or questions regarding the storage manage by ESXi. No wonder, as storage issue is the most serious one and for many VMware administrators hard to debug. Problems can be connected with storage high latency as well as with general connectivity problems.

VMware has introduced several mechanism to track the storage status and as the result update the log files.

Please take a look closer to article: https://kb.vmware.com/s/article/2113956

As we can see in this document (down the site) there are several entries admin can look for.

First of all check vobd.log for entry like:

Lost access to volume <uuid><volume name> due to connectivity issues. Recovery attempt is in progress and the outcome will be reported shortly

We need to understand, that when volume is in the lost access state, host will have no I/O read/writes as long as heartbeat I/O can be completed.

For that reason, usually in vmkernel.log the following entries can be seen:
HB at offset XXXX – Reclaimed heartbeat [Timeout]:

And also in vobd.log:

entries like this: [Timeout] [HB state 

The reclaim should be very frequently and occurs every second.

Of course, during storage problem, virtual systems cannot write to this storage too. Virtual system will try to be online as long as they can sustain the huge latency or at the end (if the problem remains) disk inaccessibility. It really depends from the system how it behaves when lost access to the disk, usually windows is going to end up with blue screen and linux will run but very unstable and at the end the only thing you can do is to reboot it.

In this scenario (host is disconnected from datastore too long – >5sec) you should see in logs the following:

NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.XXXX" state in doubt; requested fast path state update...

This is because of timeouts and HBA drivers aborts commands.

In kb: https://kb.vmware.com/s/article/1022026 explanation:

  • Array backup operations (LUN backup, replication, etc)
  • General overload on the array
  • Read/Write Cache on the array (misconfiguration, lack of cache, etc)
  • Incorrect tiered storage used (SATA over SCSI)
  • Fabric issues (Bad ISL, outdated firmware, bad fabric cable/GBIC)

One last thing is worth mentioning is great blog with scsi code decoder: https://www.virten.net/vmware/esxi-scsi-sense-code-decoder

 So you can examine log similar to the: Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE

In conclusions, problems on storage systems can be very dangerous. With hypervisor we have additional layer which need to know what is currently happening in infrastructure (storage, SAN issues) and react accordingly. We do not want to guest systems stop in any of such issues. But from the other side, system should be aware of such issue, so it can stop write to disk (which are currently not visible to hypervisor) This kind of issues are usually hard to debug for system admin.  Unfortunately, from

https://kb.vmware.com/s/article/1022026

https://kb.vmware.com/s/article/2113956

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Free(BSD)
Search for duplicated files

This will be short (but hopefully I will find more time to show entire process to search duplicated files together with some examples). In case you are searching for duplicated files I can recommend two software which actually rocks in openSource world

Azure
NFS issue, cannot be mounted or is not visible

The same kind of issue I have encountered numerous times while working across different environments and with various customers. The problem with NFS mounts connected from remote locations is so common. This issue extends beyond communication solely over WAN and also include connections between datacenters (DC) where we lack control …

Azure
Why Firefox is important and people should use this browser in 2024, my thoughts.

Can you remember the times when everyone was using Internet Explorer? Back in the ’90s and the early part of this century, Internet Explorer dominated the browser market. Software Incompatibility with Other Browsers Incompatibility issues with software and other browsers have been a persistent problem. Even in 2022, this remains …