Introduction to Data Deduplication

Data Deduplication (dedup) is a built-in storage optimization feature in Windows Server 2022 that reduces disk space consumption by identifying and eliminating duplicate data chunks across files on a volume. Rather than storing identical byte sequences multiple times, dedup stores each unique chunk once in a chunk store and replaces duplicate occurrences with references (pointers) to the single stored copy. The result is a significant reduction in disk usage — typically 30-80% space savings for general file servers, and even higher for VDI or backup workloads.

Deduplication in Windows Server 2022 operates as a post-process optimization: it runs as a background job that processes files after they are written, so there is no write latency penalty on the primary I/O path. Reads of deduplicated files are handled transparently by the dedup filter driver (ddpflt.sys), which reconstructs data on-the-fly from the chunk store. This reconstruction adds a small read latency overhead, which is why dedup has specific workload limitations discussed later.

Dedup is available on all editions of Windows Server 2022 (Standard, Datacenter, and Datacenter Azure Edition) and is supported on NTFS volumes. Support for ReFS volumes is limited and discussed in a dedicated section below.

Installing the Data Deduplication Feature

Data Deduplication is not installed by default. Install it using PowerShell or Server Manager.

Via PowerShell (recommended):

Install-WindowsFeature -Name FS-Data-Deduplication -IncludeManagementTools

The FS-Data-Deduplication feature name installs both the dedup engine (background processing service) and the PowerShell management module. Verify installation:

Get-WindowsFeature -Name FS-Data-Deduplication

List all available dedup PowerShell cmdlets:

Get-Command -Module Deduplication

Key cmdlets include: Enable-DedupVolume, Disable-DedupVolume, Set-DedupVolume, Get-DedupVolume, Get-DedupStatus, Get-DedupMetadata, Start-DedupJob, Stop-DedupJob, Get-DedupJob, and Update-DedupStatus.

Via Server Manager: Add Roles and Features > Server Roles > File and Storage Services > File and iSCSI Services > Data Deduplication. Check the box and complete the wizard.

No reboot is required after installing the Data Deduplication feature.

Enabling Deduplication on Volumes

Dedup must be explicitly enabled on each volume you want to deduplicate. The Enable-DedupVolume cmdlet turns on dedup for a specified volume and sets the usage type, which determines the dedup algorithm parameters optimized for the workload.

Enable dedup on D: for general file server use:

Enable-DedupVolume -Volume "D:" -UsageType Default

Enable dedup on E: for VDI (Virtual Desktop Infrastructure) workloads:

Enable-DedupVolume -Volume "E:" -UsageType HyperV

Enable dedup on F: for backup workloads (typically a volume storing backup VHD files):

Enable-DedupVolume -Volume "F:" -UsageType Backup

After enabling, verify the configuration:

Get-DedupVolume

This returns all dedup-enabled volumes with properties including: Enabled, UsageType, MinimumFileAgeDays, MinimumFileSize, ExcludeFolder, ExcludeFileType, and NoCompress.

Deduplication Usage Types Explained

The UsageType parameter is critical — it determines how aggressively dedup runs and which files are eligible, tuned for different workload characteristics.

Default: Optimized for general-purpose file servers. Files must be at least 3 days old (MinimumFileAgeDays = 3) before dedup processes them. This prevents hot/frequently-accessed files from being deduplicated, reducing read reconstruction overhead for active data. Minimum file size: 32 KB. This mode excludes the Recycle Bin folder and temporary file extensions by default.

HyperV: Optimized for Hyper-V VHD/VHDX files on a volume that is NOT a Cluster Shared Volume (CSV). MinimumFileAgeDays = 0, meaning even newly written files are immediately eligible. The dedup filter is configured for lower read latency because VDI workloads involve constant, latency-sensitive reads of VM disk images. This mode enables in-memory caching of frequently accessed chunks (pinning hot chunks) to minimize reconstruction overhead. Files up to 1 TB are supported in HyperV mode (vs 1 TB limit for Default mode as well, but the optimization profiles differ).

Backup: Optimized for volumes storing backup files (e.g., Windows Server Backup VHD files, backup software repositories). MinimumFileAgeDays = 0, very high throughput optimization. The chunk size is larger in this mode, trading granularity for throughput — appropriate because backup files tend to have large sequential identical regions rather than fine-grained duplicate chunks.

You can adjust MinimumFileAgeDays and MinimumFileSize independently after enabling:

# Set minimum file age to 7 days and minimum file size to 64 KB on D:
Set-DedupVolume -Volume "D:" -MinimumFileAgeDays 7 -MinimumFileSize 65536

Exclude specific folders from dedup (e.g., active database volumes should exclude the active DB files):

Set-DedupVolume -Volume "D:" -ExcludeFolder "D:DatabasesActiveDB","D:Staging"

Exclude specific file extensions:

Set-DedupVolume -Volume "D:" -ExcludeFileType "tmp","log","bak"

Deduplication Job Schedules

Dedup operations are executed via scheduled jobs. Windows Server 2022 creates default schedules automatically when dedup is enabled. Three job types exist:

Optimization: The primary dedup job that processes eligible files, identifies duplicate chunks, rewrites files using references to the chunk store, and compresses the chunk store. This is the job that produces space savings.

GarbageCollection: Purges orphaned chunks from the chunk store — chunks that are no longer referenced by any file (e.g., after deduplicated files are deleted or overwritten). Without regular garbage collection, the chunk store grows unboundedly even as files are removed.

Scrubbing: Validates data integrity of the chunk store by reading all chunks and verifying their checksums. Detects silent data corruption (bit rot) in the chunk store. Does not free space or optimize — it is a pure validation job.

View current dedup job schedules:

Get-DedupSchedule

Modify the default optimization schedule to run nightly during off-hours and limit CPU and memory usage:

Set-DedupSchedule -Name "BackgroundOptimization" -Start "01:00" -DurationHours 6 -Priority Normal -Cores 50 -Memory 25

Parameters: -Cores limits CPU usage to a percentage of logical processors. -Memory limits RAM usage to a percentage of total physical memory. Setting these prevents dedup from impacting foreground application performance during business hours.

Create a custom dedup schedule with throughput throttling:

New-DedupSchedule -Name "WeekendFullOptimization" -Type Optimization -Start "22:00" -Days Saturday,Sunday -DurationHours 10 -Priority High -Cores 75 -Memory 50 -StopWhenSystemBusy $false

The -StopWhenSystemBusy parameter controls whether the job pauses when the system becomes heavily loaded. Set to $false for dedicated storage servers where dedup can use more resources.

Create a separate garbage collection schedule (runs weekly):

New-DedupSchedule -Name "WeeklyGC" -Type GarbageCollection -Start "03:00" -Days Sunday -DurationHours 4 -Priority Low

Monitoring Deduplication Savings

After dedup has been running for some time, measure the achieved savings with Get-DedupVolume and Get-DedupStatus.

Get-DedupVolume | Format-List *

Key fields in Get-DedupVolume output:

SavedSpace: Total bytes saved by dedup (pre-dedup size minus post-dedup size).

SavingsRate: Percentage of space saved (SavedSpace / original data size × 100).

OptimizedFilesSize: Total size of files that have been deduplicated.

OptimizedFilesCount: Number of files that have been deduplicated.

InPolicyFilesSize: Total size of files eligible for dedup (meet age/size criteria).

# Get detailed dedup status for all volumes
Get-DedupStatus | Format-List Volume, SavedSpace, SavingsRate, OptimizedFilesCount, LastOptimizationTime, LastGarbageCollectionTime, LastScrubbingTime, DeduplicationSavedSpace

For a human-readable summary of savings across all volumes:

Get-DedupStatus | Select-Object Volume, 
    @{N="SavedGB";E={[math]::Round($_.SavedSpace/1GB,2)}},
    @{N="SavingsRate%";E={$_.SavingsRate}},
    @{N="OptimizedFiles";E={$_.OptimizedFilesCount}},
    @{N="LastRun";E={$_.LastOptimizationTime}} | Format-Table -AutoSize

Force an immediate update of dedup statistics (useful after a manual job run to see current numbers without waiting for the next scheduled update):

Update-DedupStatus -Volume "D:"

Running Dedup Jobs Manually

You can trigger dedup jobs immediately without waiting for the schedule using Start-DedupJob.

Run an immediate optimization pass on volume D::

Start-DedupJob -Volume "D:" -Type Optimization

Run garbage collection immediately:

Start-DedupJob -Volume "D:" -Type GarbageCollection

Run a scrubbing pass with corruption repair enabled:

Start-DedupJob -Volume "D:" -Type Scrubbing -Repair

The -Repair flag on scrubbing instructs dedup to attempt to repair corrupted chunks using redundant data where possible. Without this flag, scrubbing only reports corruption without attempting repair.

Monitor running dedup jobs:

Get-DedupJob

This shows active jobs with properties: Volume, Type, State (Running/Completed/Failed), StartTime, Progress, and Duration. To stop a running job:

Stop-DedupJob -Volume "D:" -Type Optimization

Deduplication Scrubbing

Scrubbing is a critical maintenance operation for long-running dedup volumes. Over time, hardware issues, firmware bugs, or power loss events can cause silent data corruption in the chunk store — corrupted chunks that have incorrect checksums. If dedup serves a corrupted chunk to a file read, the application receives garbage data without any I/O error being returned (the corruption is silent at the storage layer).

Scrubbing mitigates this by reading every chunk in the chunk store and verifying its stored checksum matches the actual data. Corrupted chunks are flagged. With -Repair, dedup can reconstruct some corrupted chunks using the original source files if they still exist on the volume.

Run a full scrub with repair on a large volume (allow sufficient time — scrubbing reads the entire chunk store):

Start-DedupJob -Volume "F:" -Type Scrubbing -Repair -Wait

The -Wait parameter holds the PowerShell session until the job completes, useful for scripting. After completion, check the results:

Get-DedupStatus -Volume "F:" | Select-Object Volume, LastScrubbingTime, LastScrubbingStatus, CorruptChunksDetected, TotalChunks

If CorruptChunksDetected is greater than 0 and LastScrubbingStatus shows failures after a repair run, those files cannot be recovered via dedup repair. Use your backup solution to restore the affected files. This is precisely why scrubbing and a solid backup strategy must go hand-in-hand when using dedup.

Schedule scrubbing at least monthly. Weekly is better for critical volumes.

Deduplication on CSV Volumes

Cluster Shared Volumes (CSVs) used by Hyper-V clusters or Scale-Out File Server (SOFS) can be deduplicated with specific requirements in Windows Server 2022.

For Hyper-V on CSV, use the HyperV usage type. Dedup on CSV requires that the Deduplication feature is installed on every node in the failover cluster. The dedup optimization job runs on the CSV owner node, and after a CSV failover, the new owner node continues dedup operations seamlessly.

# Install dedup on all cluster nodes (run on each node or use Invoke-Command)
$clusterNodes = (Get-ClusterNode).Name
Invoke-Command -ComputerName $clusterNodes -ScriptBlock {
    Install-WindowsFeature -Name FS-Data-Deduplication
}

# Enable dedup on a CSV volume (run on the owner node or any node with access)
Enable-DedupVolume -Volume "C:ClusterStorageVolume1" -UsageType HyperV

CSV dedup has one important operational note: during a dedup optimization job, read performance on that CSV may be slightly impacted. Schedule optimization jobs during off-peak hours, especially for production Hyper-V clusters running business-critical VMs.

Data Deduplication Limitations

Understanding dedup’s limitations prevents architectural mistakes and performance surprises.

Maximum volume size: Data Deduplication is supported on volumes up to 64 TB in Windows Server 2022. Volumes larger than 64 TB are not supported for dedup.

Maximum file size: In Default and Backup usage types, files up to 1 TB can be deduplicated. In HyperV usage type, the limit is also 1 TB per file. Files larger than 1 TB are skipped by the optimization job.

Not supported on system/boot volumes: The volume where Windows is installed (typically C:) cannot have dedup enabled. This prevents dedup from interfering with OS boot and critical system files.

Not suitable for databases in Default mode: Active SQL Server, Exchange, or Hyper-V VHD files should not be deduplicated in Default mode because the read reconstruction overhead degrades I/O performance unacceptably. Use HyperV mode for VHDs, and exclude active database files from dedup entirely.

ReFS limitations: Dedup is not supported on ReFS volumes (discussed in the next section).

Recovery complexity: If the chunk store itself is corrupted and scrubbing cannot repair it, you cannot recover deduplicated files without a traditional backup. The dependency on the chunk store means a single point of failure exists — always maintain backups of deduplicated volumes.

Check which files on a volume have been optimized (and their dedup savings):

# List deduplicated files on D: and show original vs optimized size
Get-DedupFileMetadata -Volume "D:" | Select-Object FileName, 
    @{N="OriginalSizeKB";E={[math]::Round($_.DataChunkCount * 32)}},
    DedupChunkCount | Sort-Object OriginalSizeKB -Descending | Select-Object -First 20

Comparing Deduplication to Compression

Data deduplication and compression are related but fundamentally different storage optimization techniques, and Windows Server 2022 can use them together.

NTFS Compression operates at the file level, compressing individual files in-place on the volume. It is transparent to applications, reduces file sizes by 20-60% for compressible data (text, logs, documents), and adds CPU overhead on every read/write. Not suitable for already-compressed formats (JPEG, MP4, ZIP) or databases that have large sequential read patterns. Enable per file or folder:

# Enable NTFS compression on a folder
compact /c /s:"D:OldArchives"

# Check compression status
compact /u /s:"D:OldArchives"

Data Deduplication operates across the entire volume, finding identical data chunks between different files. It is most effective when many files share common data — for example, 100 VMs all having the same OS image, or a backup repository storing incremental backups of similar servers. Dedup can save 80-95% of space in VDI environments.

Using both together: Dedup has a built-in compression option for the chunk store. When you run Enable-DedupVolume without specifying -NoCompress, dedup compresses each chunk after deduplication. This provides both dedup savings (cross-file identical chunks) and compression savings (within each stored chunk), maximizing total space savings:

# Enable dedup WITH compression (default behavior, NoCompress not specified)
Enable-DedupVolume -Volume "D:" -UsageType Default

# Verify compression is enabled (NoCompress should be False)
Get-DedupVolume -Volume "D:" | Select-Object Volume, NoCompress

To disable compression within dedup (useful for already-compressed data or when CPU resources are constrained):

Set-DedupVolume -Volume "D:" -NoCompress $true

Data Deduplication with ReFS

Resilient File System (ReFS) in Windows Server 2022 does NOT support Data Deduplication via the Windows dedup feature (FS-Data-Deduplication). If you attempt to enable dedup on a ReFS volume, the cmdlet returns an error:

# This will FAIL on a ReFS volume:
Enable-DedupVolume -Volume "R:"
# Error: The volume R: does not support data deduplication.

ReFS provides its own storage efficiency feature through integration with Storage Spaces Direct (S2D): ReFS Block Clone and ReFS Integrity Streams. For Hyper-V workloads on Storage Spaces Direct with ReFS, the deduplication equivalent is handled at the storage pool level by S2D’s built-in compression and deduplication (available in Windows Server Datacenter edition).

If you need dedup and are evaluating NTFS vs ReFS:

Choose NTFS when: you need traditional dedup for file servers, backup repositories, or general storage volumes. Dedup works well on NTFS with appropriate workload type selection.

Choose ReFS when: you are deploying Hyper-V with Storage Spaces Direct (S2D), need ReFS integrity streams for bit-rot detection, or require fast clone operations for Veeam synthetic fulls. Accept that Windows dedup is unavailable on ReFS volumes.

For environments requiring both storage efficiency and ReFS reliability, the recommended path is Storage Spaces Direct with ReFS and S2D’s native compression/deduplication (enabled at the storage tier level with Enable-ClusterStorageSpacesDirect -CacheMode NoneCacheMode and appropriate storage pool settings). This is an advanced configuration beyond the scope of the standalone dedup feature but is worth noting for capacity planning decisions.

Links

Newsletter

Contact

How to Set Up Data Deduplication on Windows Server 2022