Hello guys,
I have a 4 node Windows 2012 R2 server cluster, that is used for virtualization. I am running nearly 18 virtual machines.
We have 4 LUNs presented to the cluster: 3 CSV Volumes (2x1TB and 1x730GB) and 1 Quorum Disk (1GB).
We have implemented Acronis Backup Advanced for Hyper-V, but we have critical issues with the Backups. The top of the iceberg was a crash of all nodes on the 17.02.2014 just after 18:00 when the backup rotation starts. The evening backups are incremental and are set to one machine at a time (only production machines, that are around 10) and each machine has a 1 hour window.
We have read, that there was such problem with Windows 2012 and there was a fix, but it seems, the problem is not fixed in Windows 2012 R2 and there is no patch.
We are running IBM server, that have latest drivers and firmware. We also installed latest patches during the 14-15.02.2014 weekend.
We noticed that on 2 cluster, the events before the crash are start of VSS service and on the other 2 are start of VSS and then start of Acronis VSS. Additionally, we notice the following event 5120: "Cluster Shared Volume 'Volume1' ('Cluster Disk 1') has entered a paused state because of '(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished." happening during backup rotation.
You may find files with the events before the crash, the bug check events, CSV disk events and a dump analysis on the following link.
It is crucial to understand, that when we run backup from within VMs, we do not receive and events of the type 5120 or any crashes. We have not tried to run through Windows Backup utility from hosts.
Please, suggest solutions.