Partition alignment in VMware vSphere 5, a DeepDrive, Part-1
This topic has been discussed seriously for a long time… in the virtualization domain. I would like to add few insights in to this topic, with the new vSphere 5.x release there are lot of changes happened in the VMFS filesystem and also with the release of Windows 2008, 2008 R2, 2012 and RHEL 6.x, Ubuntu 12.x there is no need of doing partition alignment, but it would be good to know how these OS do the partition and handle the disk volumes.
Also the legacy OS like Windows 2003, RHEL 5.x still need partition alignment with vSpehere 5 & VMFS 5 filesystem. In short, generally this topic applies to physical, other hypervisor vendors and to VMWARE also.
There are many outdated articles in the web and below info will give a good insight to this topic.
Theory & History
As we all know a physical server or Storage array need physical HDD’s, with virtual machine it is a virtual HDD. The below image shows the hard disk geometry
In the case of early IDE/ATA hard disks the BIOS provides access to the hard disk through an addressing mode called Cylinder-head-sector, also known as CHS, was an early method for giving addresses to each physical block of data on a hard disk drive. CHS addressing is the process of identifying individual sectors on a disk by their position in a track, where the track is determined by the head and cylinder numbers.
In old computer system the maximum amount of addressable data was very limited – due to limitations in both the BIOS and the hard disk interface. The legacy OS like NT, DOS etc uses this method.
Modern hard disks use a recent version of the ATA standard, such as ATA-7. These disks are accessed using a different addressing mode called: logical block addressing or LBA involves a totally new way of addressing sectors. Instead of referring to a cylinder, head and sector number, each sector is instead assigned a unique “sector number”. In essence, the sectors are numbered 0, 1, 2, etc. up to (N-1), where N is the number of sectors on the disk. So all modern disk drives are now accessed using Logical Block Addressing (LBA) scheme where the sectors are simply addressed linearly from 0 to some maximum value and disk partition boundaries are defined by the start and end LBA address numbers. In LBA addressing system with each cylinder standardized to 255 heads, each head has one track with 63 logical blocks or sectors and each one has 512 bytes. You can see this info in linux
All the modern OS uses the LBA method to read/write data to the Harddisk.
In olden days the physical sector of the HDD is 512-byte, and this has been the standard for over 30 years. This physical sector size will match with the size of one Logical block or sector of the OS that is 512 byte, so no issue. During 2009, IDEMA (The International Disk Drive Equipment and Materials Association) and leading data storage companies introduced Advanced Format (AF) Technology in the HDD, so that the physical sector size will be 4K (4,096 bytes). Disk drives with larger physical sectors allow enhanced data protection and correction algorithms, which provide increased data reliability. Larger physical sectors also enable greater format efficiencies, thereby freeing up space for additional user data.
One of the problems of introducing this change in the media format is the potential for introducing compatibility issues with existing software and hardware. As a temporary compatibility solution, the storage industry is initially introducing disks that emulate a regular 512-byte sector disk, but make available info about the true sector size through standard ATA and SCSI commands. As a result of this emulation, there are, in essence, two sector sizes:
Logical sector: The unit that is used for logical block addressing for the media. We can also think of it as the smallest unit of write that the storage can accept. This is the “emulation.”
Physical sector: The unit for which read and write operations to the device are completed in a single operation. That is the Actual physical sector size of storage data on a disk.
Initial types of large sector media
The storage industry is quickly ramping up efforts to transition to this new Advanced Format type of storage for media having a 4 KB physical sector size. Two types of media will be released to the market:
4 KB native: Disks that directly report a 4 KB logical sector size and have a physical sector size of 4 KB – The disk can accept only 4 KB IOs to the disks. However, the software stack can provide 512-byte logical sector size support through RMW support. This media has no emulation layer and directly exposes 4 KB as its logical and physical sector size. The overall issue with this new type of media is that the majority of apps and operating systems do not query for and align I/Os to the physical sector size, which can result in unexpected failed I/Os.
512-byte emulation (512e): Disks that directly report a 512-byte logical sector size but have a physical sector size of 4 KB – Firmware translate 512 byte writes to 4k writes RMW (Read Modify Write). In today’s drives, this translation introduces a performance penalty. This media has an emulation layer as discussed in the previous section and exposes 512-bytes as its logical sector size (similar to a regular disk today), but makes its physical sector size info (4 KB) available.
Overall Windows support for large sector (4KB) media
This table documents the official Microsoft support policy for various media and their resulting reported sector sizes. See this KB article for details.
Windows 8, windows server 2012 and Starting with Linux Kernel Version 2.6.34 has full support to read and write 4K for the LBA by the OS. Operating systems like Windows 7, 2008, 2008 R2 still uses 512 Bytes for each logical block or sector, so one physical sector of 4K contains 8 logical sectors of size 512 Bytes.
512-byte emulation at the drive interface
To maintain compatibility, Western Digital, Hitachi, Toshiba emulates a 512-byte device by maintaining a 512-byte sector at the drive interface that is the firmware inside the HDD controller will do the conversion; and Seagate uses SmartAlign technology, for this emulation (firmware level). These drives are also called Advanced Format 512e. Let see how this works !!!
When the host requests to read a single 512-byte logical block, the hard drive will actually read the entire 4K physical sector containing the requested 512 bytes. The 512-byte block is extracted and sent to the host. This can be done very quickly.
512-byte Write (Read-Modify-Write)
When the host attempts to write a single 512-byte logical block, the hard drive will first read the 4K physical sector containing the 512 bytes that are to be overwritten. Next, it will insert the 512 bytes of new data and write the entire 4K block of data back to the media. This process is called a “Read-Modify-Write”. The drive must read the existing data, modify a subset, and then write the data back to the disk. This process can require additional revolutions of the hard disk.
How Does Advanced Format Technology Maintain Performance?
In order to maintain top performance, it is important to ensure that writes to the disk are aligned. Ideally, writes should be done in 4K blocks, and each block will then be written to a physical 4K sector on the drive. This can be accomplished by ensuring that the OS and applications write data in 4K blocks, and that the drive is partitioned correctly. Most modern operating systems use a file system that allocates storage in 4K blocks or clusters (NTFS Cluster/File allocation unit or EXT3/4 file system block size). In a traditional hard drive, the 4K block is made up of eight 512-byte sectors (see Figure 4: 512-byte Emulated Device Sector Size).
In production, the RAID layer will come in to picture, these 4KB physical sectors are again combined to form a RAID STRIP size, this may vary from 4KB to 256 KB depending upon the RAID level. In the storage array also this is the same. So the RAID array controller like PERC, Intel, LSI, Smart Array etc will handle and gives a RAID volume to the OS. In both cases the OS partition needs to be aligned.
Since most modern operating systems will write in 4K blocks, it is important that each 4K logical block is aligned to a physical 4K block on the disk (see Figure 5). This is especially important because the 512e feature of the drive cannot prevent a partitioning utility from creating a misaligned partition. When misalignment occurs, a logical 4K block will reside on two physical sectors.
In this case, a single read or write of a 4K block will result in a read/write of two physical sectors. The impact of a “read” is minimal, whereas a single write will cause two “Read-Modify-Writes” to occur, potentially impacting performance (see Figure 6).
So now what will happen with the OS like XP, windows 2003, RHEL5.x etc and why we need disk Alignment in PHYSICAL world or in VIRTUAL world ?
As I mentioned the physical disks has 4KB physical sectors and RAID volume or a LUN from storage array has 4KB to 256 KB STRIP SIZE, that is a multiple of 4KB, and the operating system has 512 bytes of logical sector. When we install the operating system, like 2003, RHEL 5.x. in a HDD or a LUN, in these OS the first 62 sectors (first track) of the HDD is reserved for BOOT area and it is hidden to the OS.
That is sectors from 0 to 62 and reserved (hidden), the master boot record (MBR) resides within these hidden sectors. The master boot record (MBR) resides within these hidden sectors. It uses the first sector of the first track for MBR data (LBA 0) and the first partition begins in the last sector of the first track, which is from (logical) block address 63. You can see this in the below;
Here in RHEL 5.x/LINUX older version, we can see the first partition starts from sector/LBA 63, and if you add another HDD or a LUN this host, and when you create a partition, the partition tool of these linux versions again create partition starting from sector 63.
Here in the above info from Windows 2003, the first partition starts from the offset 32256, in windows it won’t show the LBA/sector number, instead it shows the values in Bytes. That is an offset of 32256 means (32256/512) = 63 LBA/sector, so the partition starts from sector 63. Below is the detailed way of confirming this;
Essential Correlations: Partition Offset, File Allocation Unit Size, and Stripe Unit Size
Use the information in this section to confirm the integrity of disk partition alignment configuration for existing partitions and new implementations.
There are two correlations which when satisfied are a fundamental precondition for optimal disk I/O performance. The results of the following calculations must result in an integer value:
Partition_Offset ÷ Stripe_Unit_Size (Disk physical sector size or RAID strip size)
Stripe_Unit_Size ÷ File_Allocation_Unit_Size
Of the two, the first is by far the most important for optimal performance. The following demonstrates a common misalignment scenario: Given a starting partition offset for 32,256 bytes (31.5 KB) and stripe unit size of 4086 bytes (4 KB), the result is 31.5/4 = 7.894273127753304. This is not an integer; therefore the offset & strip unit size are not correlated.
In the second one, the NTFS cluster size or file allocation unit the default value is 4086, and we can give 32KB, 64KB etc. If it is MSSQL and EXCHANGE It is recommended to give 64KB during the formatting time, and this value is not an issue and it will be an integer. But the first one is the crucial !!!
So we have MISALIGNMENT, and we need to realign the partition, the below diagrams show pictorially;
So Windows 7, 8, 2008, 2008 R2, 2012, RHEL 6, Debian 6, Ubuntu 10, 11, 12, SUSE 11 onwards, automatically aligns partitions during installation. You can see this in the below;
In Windows case the partition alignment defaults to 1024 KB or 1MB boundary (that is, startingoffset 1,048,576 bytes = 1024KB). It correlates well (as described in the previous section, 1024KB/4KB = 256 an integer) with common stripe unit sizes such as 4KB, 64 KB, 128 KB, and 256 KB as well as the less frequently used values of 512 KB and 1024 KB. That is simply the windows partition tool begin the first partition at LBA/sector 2048 (1,048,576/512 = 2048). So here not need to manual alignment and if we add another disk also it will do auto alignment.
In RHEL and latest linux cases, the first partition starts from 2048 that is LBA 0 to 2047 is reserved. That is the OS is aligned to 1MB boundary, if we do the math the sector 2048 is at the offset 1,048,576 bytes (1,048,576 bytes/512 = 2048) and if we add another HDD or LUN it will do alignment automatically.
My next post will be discussing how to do the disk alignment in vSphere or any other hypervisor.