The same kind of issue I have encountered numerous times while working across different environments and with various customers. The problem with NFS mounts connected from remote locations is so common. This issue extends beyond communication solely over WAN and also include connections between datacenters (DC) where we lack control over the network and server stack.
The initial impression may seem acceptable, and in many scenarios, communication over WAN (or in general remote) will work – at least at the beginning – especially with NFS version 4.1/4.2. These are more resilient for network shortages or general network problem. (Using the newest NFS version is recommended anyway in all architectures where possible of course, but this is another topic). However, the message is: avoid using NFS from remote location whenever possible. Plan the architecture so that NFS shares are as close to the resource as possible. If you need to use remote data, use other protocols for transferring the data, use system messages (Kafka, RabbitMQ), use filesystem replication, use cloud object storage, .. even simple SFTP will do much better work. Using NFS from remote location should truly be considred as the last resort, when no other options are feasible.
Remember that the problem with NFS in a vSphere environment can be even more serious, as it can affect multiple systems simultaneously, making them completely inaccessible.
The main two factors why you should newer consider mounting NFS remotely are:
- You might face issue with applications, where many application including SAP requires to have hard mounted remote NFS. This means, that if the NFS becomes unavailable, the operating system will continuously retry NFS request until the NFS server respond. The application can not write to NFS, disk is not visible and causing in most scenario application to freeze (or partially freeze). Typically developers assume that NFS servers will be available all the time.
On the other hand, as you know, the sun is always rising. So, NFS was most probably available all the time when the solution was developed and grew 🙂 - You might find yourself in a situation where restarting Linux is the only option to restore functionality to NFS after encountering issues with certain smart network devices that actively block connections (albeit not immediately). In such situation NFS client is trying to get access to NFS server (TCP SYN) but never receives a respond. The smart network device can detect connection reuse and will constantly block it (since it was blocked initially). NFS server is trying to reuse connection, as this the way how it works. That behavior was fixed with kernels from 2022 onwards, but practice shows that even then it do not always work and NFS client restart is the only option to resolve the issue. This issue was in details described: https://www.suse.com/support/kb/doc/?id=000019722
Kernel (for SLES) with fixes:
That could reflect poorly on you if you have to restart the Linux server because of NFS, especially if you had previously laughed at a colleague who had to restart a Windows server for some trivial reason 🙂
Other factors you may consider with remote NFS:
- (unexpected) latency and bandwidth limitations
- in general, network connectivity issues can occur; lazy unmounting and, in general, NFS connection restoration can be a painful process.
- limited or no possibility to control the remote NFS server
- limited or no possibility to control the remote network
- issues with NFS connectivity, together with improper NFS configuration (in fstab), can cause problems when starting the server (long timeouts or inability to start, including the emergency console prompt
Mitigation:
- use NFS with the newest version (supported by client and server)
- avoid different OS (and version) between server and client; for example NFS server running on Windows Server and Linux as a client
- customize NFS parameters like read/write butter sizes, timeouts, retry counts and test them in your environments
- ready cloud providers documentation for optimal mount parameter. In many cases, like for example in Azure Storage Account you can copy them from Azure Portal
If you have any insights or your own experiences, feel free to leave a comment. Have a great day!
No Comments