Skip to content

VMware vSAN Data Recovery from Dell VxRail

VMware vSAN Data Recovery from Dell VxRail

Client:
Government

Device Type:
Dell VxRail

Virtual Machine:
VMware vSAN

​​A government entity lost 19.8 million audio files due to multiple disk failures in their Dell VxRail system. Despite extensive efforts by VMware and Dell support, the data loss was deemed irrecoverable so VMware Tier 4 technical support referred the customer’s Managed Service Provider to DriveSavers.

Data Loss Situation

A government entity lost access to millions of audio files stored on their Dell VxRail system due to multiple disk failures within their VMware stretched cluster OSA vSAN The files had no backup and no way of being recreated.

The Dell VxRail system had 40 NVME disks allocated into 8 diskgroups consisting of 5 disks each (1xSSD for the cache tier and 4xSSD for the capacity tier) to create the vSAN cluster. Within the cluster were over 150 virtual disk components, with one critical 10TB virtual disk containing the missing audio files.

Over a 2-month period, VMware and Dell support teams collaborated to examine the root cause of the data loss. It was determined that one specific diskgroup was affected; it had a history of checksum errors and power-on resets, leading to continuous resyncs/rebuilds culminating in complete diskgroup failure. This resulted in a very complex vSan component tree with stale components and several layers of nesting. The case was deemed unrecoverable, and consequently VMware Tier 4 technical support referred the customer’s Managed Service Provider to DriveSavers.

Data Recovery Solution

DriveSavers enterprise recovery engineers began by analyzing the failed diskgroup to determine its recoverability using proprietary data recovery tools specifically developed for VMware vSan environments, DriveSavers engineers determined that the critical 10TB virtual disk containing the audio files was configured as a Raid 5 disk component, which meant subcomponents were spread throughout all 40 physical disks in the VxRail.

The customer sent the remaining 35 disks to the lab, where DriveSavers engineers rebuilt the VxRail RAID structure. This allowed them to carry out a full diagnosis of the critical 10TB virtual disk and its guest filesystem.

The full diagnosis revealed further inconsistencies at the guest file system level that had to be resolved, along with identifying tens of millions of files, contained not just within a single NTFS partition, but in a single folder. Due to the number of files, and the potential for difficult-to-detect file corruption, DriveSavers developed additional, special recovery tools to be able to quickly process, extract, and validate the audio files for validity and consistency. The result was 19.8 million files successfully recovered, with over 98% of them fully intact.

Recovery Summary

This recovery involved multiple layers of complexity and problem-solving and illustrates that it is essential to be able to create and customize new tools throughout the process. Working together with hardware and software support teams is critical in complex data loss situations. Our established relationships with Dell and VMware global support teams, along with our software engineering expertise and highly specialized data recovery problem-solving techniques, led to a successful recovery.

Principal Software Engineer
Ernesto Lobo is a Principal Software Engineer at DriveSavers Data Recovery with over 17 years of experience in data recovery and software engineering. Ernesto specializes in developing data recovery solutions for enterprise storage systems and custom data recovery tools for unique data loss scenarios. He is known for his technical expertise and dedication to solving complex data loss issues, consistently delivering exceptional results in the field of data recovery.

Back To Top
Search