Oracle RAC Cluster in VMware vSphere : VMotion Caveat


Last week I faced a very critical issue with our Oracle RAC cluster in vSphere 5, for the ESXi maintenance we have performed a VMotion on the Oracle nodes – then the FUN began !!


We have 2 node oracle RAC cluster running on Vsphere5

For the Oracle RAC cluster we are using VMFS5 file system, I mean VMDK only (no RDM)

Oracle nodes are using windows 2008 R2 SP1 and oracle

The clusters are running on HP Blade center C7000 with ProLiant BL680c G7 blades

The ORACLE private and public network are using vDS (VMware distributed switch)

Storage, we are using HP 3PAR FC SAN for the vsphere5


We performed a VMotion on Oracle RAC cluster nodes one by one, after that we got a BSOD for both nodes and the VM’s crashed completely.


I spent some time in researching the stop code that was in the BSOD: STOP 0x0000FFFE
It appears that it is not a normal blue screen, as the above stop code is not given from Microsoft. It is actually a forced blue screen from Oracle.  Unfortunately, I am unable to find specific information regarding it from Oracle’s website, but I did find the following blog article (this information may not be completely reliable as it is just a blog article):

“OraFence has a built-in mechanism to check it was scheduled in time. If it is not scheduled within 5 seconds it will also reboot the note. In this way, OraFence is designed to fence and reboot a node if it perceives that a given node is ‘hung’ once its own timeout has been reached. Note that the default timeout for the OraFence driver is a (very low) 0x05 (5 seconds). What this means is that if the OraFence driver detects what it perceives to be a hang for example at the operating system level and that hang persists beyond 5 seconds, it’s possible that the OraFence driver – of its own accord – will fence and evict the node.” —


RCA for the ISSUE :
The above article suggests that a simple timeout could cause an Oracle driver to BSOD the machine with the stop code 0x0000FFFE. This would be something that could occur during a vMotion simply because, during the VMotion the operating system is quiesced, and depending upon the VM RAM and VMotion configuration it will take around 16 seconds to complete the VMotion (we have configured MultiNic VMotion in the HP Blades)

Oracle uses fencing mechanism for the RAC cluster, just like the SCSI fencing in the REDHAT CLUSTER to reboot the nodes. Because of the VMotion quiescence process and with default timeout for the OraFence driver 5 seconds, because the VMotion quiescence took more than 5 seconds and this caused the BSOD.

The above blog article recommends increasing the timeout from 5 seconds because that is very aggressive.I would recommend contacting Oracle Support with the details, as they would probably have a more complete picture of how their product forces a BSOD and how to avoid it in the future.


Increase the default timeout for the OraFence driver, so the BSOD should not occur again due to a quiesced operating system (during VMotion or in DRS/HA environment)

Design Notes

While designing the Oracle RAC cluster in vSphere, taking this consideration will avoid such crashes. Also if you are running the RAC nodes in the VMware HA/DRS environment, this should be considered.


About GK_RAJ

An enthusiastic IT person, with an intense passion towards Datacenter technologies. I am a VMware vExpert Title holder and working as a Technical Consultant, in Qatar. I am exposed to VMware vSphere, Storage, Bladecenters, Datacenter operations, Symantec Backup, Deduplication technologies and carry rich and diversified experience in these domains. I specialize in Designing & Consulting on VMware VSphere, the integration of Storage and Network Stacks to VSphere. With my experience, I help Organizations/Enterprises to achieve their CAPEX & OPEX savings, develop DR and BCP strategies, Consolidation services with Virtualization using VSphere, and prepare them to move to Cloud. In the meantime, I would like to share my knowledge and do a good contribution to the community. I am an Indian citizen, and have a Engineering degree in Electronics and Communication. I have certified in VCAP5-DCD, VCP-Cloud, VCP 4 & 5, MCITP, MCSE.

Posted on November 13, 2012, in HP Blades, VMware. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Dan Gorman's Technology News Aggregation

My Daily Readings from Zite

Virtual Reality

Eat > Sleep > Virtualize > Repeat

Brad Hedlund

stuff and nonsense


A blog focusing on day 2 day virtualization stuff

Every Cloud Has a Tin Lining.


Experience the Datacenter Technologies – VMware vEvangelist

Experience the Datacenter Technologies

The weblog of an IT pro specializing in virtualization, networking, cloud, servers, & Macs

Eric Sloof - NTPRO.NL

Experience the Datacenter Technologies


Experience the Datacenter Technologies

Welcome to vSphere-land!

your ultimate VMware information destination

Michelle Laverick...

Images too small? Try "zooming" in with your web-browser...


By Josh Odgers - VCDX#90

Long White Virtual Clouds

all things vmware, cloud and virtualizing business critical applications

Virtual Geek

Experience the Datacenter Technologies

Yellow Bricks

by Duncan Epping

%d bloggers like this: