ESXi VMFS Heap size Blockade – For Monster Virtual Machines in Bladecenter Infrastructure
Recently I faced a big issue with my client, we are migrating VM’s from old HP Proliant servers to HP Bladecenter, we are using
vSphere5, HP BL680c G7 Blades (Full-height, double-wide blade – 512GB RAM, Quad socket), HP 3PAR V400 FC SAN
The VM’s are very huge in VMDK level, some of them are 10TB and most of the others are in the range 2 to 3 TB VMDK. We have moved around 140 VM’s (35 per ESXi host).
After that we are not able to do VMotion, DRS, SVMotion and we are not able to create new VM’s.
I believe those who are are going/planning to achieve more consolidation ratio in their datacenter with vSphere, will definitely face this issue. Now a days the HP/IBM Bladecenters/ IBM PureFlex etc dominates the big Datacenters and also the Specs of the Rack servers, RAM size etc are huge. So while doing the new vSphere implementation or Design, this factor has to be considered well in advance.The errors we got are;
During SVmotion –
Relocate virtual machine lf0hrap01p.xxxx A general system error occurred: Storage VMotion failed to copy one or more of the VM ‘s disks. Please consult the VM’s log for more details, looking for lines starting with “SVMotion”.
During VMotion –
Migrate virtual machine wfowsus1p.xxxx A general system error occurred: Source detected that destination failed to resume.
During HA –
After virtual machines are failed over by vSphere HA from one host to another due to a host failover, the virtual machines fail to power on with the error:
vSphere HA unsuccessfully failed over this virtual machine. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: Cannot allocate memory.
During manual VM migration –
When you try to manually power on a migrated virtual machine, you may see the error:
The VM failed to resume on the destination during early power on.
Reason: 0 (Cannot allocate memory).
Cannot open the disk ‘<<Location of the .vmdk>>’ or one of the snapshot disks it depends on.
You see warnings in /var/log/messages or /var/log/vmkernel.log similar to:
vmkernel: cpu2:1410)WARNING: Heap: 1370: Heap_Align(vmfs3, 4096/4096 bytes, 4 align) failed. caller: 0x8fdbd0
vmkernel: cpu2:1410)WARNING: Heap: 1266: Heap vmfs3: Maximum allowed growth (24) too small for size (8192)
cpu4:1959755)WARNING:Heap: 2525: Heap vmfs3 already at its maximum size. Cannot expand.
cpu4:1959755)WARNING: Heap: 2900: Heap_Align(vmfs3, 2099200/2099200 bytes, 8 align) failed. caller: 0x418009533c50
cpu7:5134)Config: 346: “SIOControlFlag2” = 0, Old Value: 1, (Status: 0x0)
The reason for this issue is, with the default installation/configuration of ESXi host, there is a limitation in the VMkernel, to handle the Opened VMDK files in the VMFS file system.
The default heap size in ESXi/ESX 3.5/4.0 for VMFS-3 is set to 16 MB. This allows for a maximum of 4 TB of open VMDK capacity on a single ESX host.
The default heap size has been increased in ESXi/ESX 4.1 and ESXi 5.x to 80 MB, which allows for 8 TB of open virtual disk capacity on a single ESX host.
From the VMware KB article 1004424 it is explained and the steps to resolve this issue.
We need to change the VMFS heap size of the ESXi host to 256 MB, and reboot the host.
– In the article it is mentioned VMFS3 and in the ESXi5 host advance configuration also you only see the VMFS3, but this applies to VMFS5 also. We confirmed from the VMware technical support.
– The VMFS Heap is a part of the kernel memory, so increasing this will increase memory consumption of the kernel which results in shortage of memory for other VM’s on the system.