Duplicate Vision: A Calm, Practical Way to Clean Duplicate Files on Windows

If your storage keeps getting full for no obvious reason, duplicate files are usually one of the quiet culprits. Screenshots copied three times, old exports from editing apps, repeated backup folders, and photo sets you forgot you already moved. It builds up slowly, then suddenly your drive starts gasping.

That exact frustration is why I built Duplicate Vision.

This is a Windows desktop app focused on one thing: finding true duplicate files quickly, previewing them safely, and cleaning them up without reckless deletion. No cloud upload, no complicated setup, and no guessing game.

Why Duplicate Vision Feels Fast Even on Large Folders

A lot of duplicate checkers are either fast but unreliable, or accurate but painfully slow. Duplicate Vision uses a multi-stage process to balance both.

It starts by grouping files by size, because files with different sizes can never be duplicates. Then it computes a partial fingerprint by sampling a small portion of each candidate file. Only after that does it run full-file hashing. Byte-by-byte verification is available as an optional paranoid mode for users who want absolute certainty, but is turned off by default because the hash pipeline is already exact enough for everyday use.

In simple terms: it avoids expensive work until it is absolutely needed.

This is also parallelized using a thread pool, so scanning big folders does not feel stuck in one long blocking step. The scanner detects whether the target drive is an SSD or a hard drive and automatically adjusts thread counts to match — more threads on flash storage, fewer on rotational disks where seek thrashing would cancel out any gains.

Persistent Hash Cache for Faster Repeat Scans

One of the most noticeable quality-of-life additions is the SQLite hash cache.

After your first scan, hashes for every processed file are stored locally in a small database (%APPDATA%DuplicateVisionhash_cache.db on Windows). When you scan the same folder again, any file whose path, size, and modification timestamp have not changed is loaded directly from the cache instead of being re-hashed from disk.

For large folders you scan regularly — like a photo library or a project archive — this makes repeat scans dramatically faster. Only new or modified files do any real I/O work. The cache also tracks directory modification times, so unchanged folders skip the filesystem walk entirely.

Cache entries older than 90 days are evicted automatically to keep the database from growing indefinitely.

Adaptive Worker Mode

The header bar includes a mode dropdown with three settings: Conservative, Balanced, and Aggressive.

These control how the scanner tunes its internal thread pool across phases:

Conservative: fewer threads, lower CPU and disk pressure. Good for slower machines or when scanning while doing other work.
Balanced: the default. Scales to roughly cpu_count + 4 threads for most workloads.
Aggressive: maximizes parallelism, up to 32 threads on capable SSDs. Best when you want results as fast as possible and the machine is idle.

The mode is persisted across sessions via an environment variable (DV_ADAPTIVE_MODE), so whatever you set carries over the next time you open the app.

On hard drives, the mode still matters but the upper cap is two threads regardless — beyond that, the disk head starts seeking between thread positions and throughput actually drops.

File Type Filter

The header now includes a file type filter dropdown next to the mode selector. Instead of scanning every file in a folder, you can restrict the scan to a specific category:

Images: jpg, jpeg, png, gif, bmp, webp, tiff, heic, avif, ico, svg
Videos: mp4, mkv, avi, mov, wmv, flv, webm, m4v, mpg, mpeg, 3gp
Audio: mp3, flac, wav, aac, ogg, m4a, wma, opus, aiff
Documents: pdf, doc, docx, xls, xlsx, ppt, pptx, odt, txt, csv, rtf
Archives: zip, rar, 7z, tar, gz, bz2, xz, iso
All Files (default): no filter applied

Filtering is useful when you only care about one type — scanning a Downloads folder for duplicate images runs noticeably faster when videos and archives are excluded.

Streaming Results While the Scan Runs

In earlier versions, the results list only appeared after the entire scan finished. Now, duplicate groups appear as they are found — the list populates in real time while hashing is still in progress.

This makes large scans feel more interactive. You can already start reviewing and selecting files in early groups while the scanner is still working through the rest of the folder.

Interface That Stays Out of Your Way

The app uses a three-part layout: top header, main results area, and bottom action bar.

The header gives real-time scan status, including progress during hashing, a speed readout, and the active mode and filter. The center area shows grouped duplicates in a scrollable list. The right panel gives you a live preview with file details, so you can verify before selecting anything. At the bottom, smart actions help you clean up quickly.

The scan button doubles as a stop button — if you start a scan and want to cancel it, clicking the same button again stops the scan gracefully and resets the app to idle, without leaving anything in a broken state.

The point is simple: less clicking, fewer mistakes.

Sort Order for Duplicate Groups

The results list now includes a sort toolbar above the group cards. You can order groups by:

Default Order: the order duplicates were found (stable across the session).
Biggest First: largest wasted space at the top — useful when your goal is maximum space recovery.
Smallest First: smallest groups at the top — useful for a quick cleanup pass on minor clutter.

The sort never changes the underlying scan data. Switching back to Default Order always restores the original sequence.

Virtual Scrolling for Large Results

The results list uses virtual scrolling — only the group cards that are actually visible in the viewport (plus a small buffer above and below) exist as real widgets. Everything outside the visible area is represented by a blank canvas of the correct height.

This means the app handles 50,000 or more duplicate groups without any perceptible lag. Without virtual scrolling, rendering that many card widgets at once would make the UI unresponsive for several seconds just on scroll.

Smart Selection for Real-World Cleanup

When people clean duplicates, they usually want one of two strategies:

Keep the oldest file (for archival workflows).
Keep the newest file (for active project workflows).

Duplicate Vision has both as one-click smart selection actions.

After that, you can clear or fine-tune choices manually per file. So you keep control, but you do not have to do repetitive work from scratch.

Safer Delete Flow (No Instant Obliteration)

One important design choice: selected files are moved to the Windows Recycle Bin, not permanently erased immediately.

That means cleanup remains intentional and reversible. If you accidentally remove something, recovery is still possible through the normal Windows path.

For a tool that deals with deletion, this safety net matters more than flashy features.

Better Confidence Before You Click Delete

The preview panel helps reduce anxiety when handling duplicate groups.

For image files, you can see the actual image directly. For non-image files (including videos, audio, documents, and archives), you still get clear metadata: filename, size, modified date, and full path. This quick context makes decision-making faster, especially when filenames look similar but locations differ.

About Dialog

Clicking the ⓘ button in the header opens the About dialog. It contains a plain-language description of how the app works, a step-by-step quick start guide, a full reference of every button and control, developer information, and a Ko-fi support link.

It is designed to be genuinely useful, not just a version number placeholder.

Built for Everyday Windows Use

Duplicate Vision is a desktop-first utility with a dark UI, clear typography, and responsive behavior in a resizable window. It is designed for practical maintenance sessions: open, scan, verify, clean, done.

No over-engineered workflow. No enterprise dashboard feeling. Just a focused utility for a common storage problem.

How to Use Duplicate Vision (Quick Walkthrough)

If this is your first time, here is the easiest workflow:

Open Duplicate Vision.
Choose a mode (Balanced is a good default).
Optionally choose a file type filter if you only want to scan one category.
Click Select Folder.
Choose a target directory (for example: Downloads, Photos export folder, or old project archives).
Click Start Scan and wait until scanning is complete.
Review duplicate groups in the list. Use the sort dropdown if you want biggest groups first.
Click file rows to preview and select.
Use Smart Select: Keep Oldest or Keep Newest if you want faster bulk selection.
Click Delete Selected to move selected files to Recycle Bin.
Click Stop at any point to cancel the scan and return to idle.

That is it. You can clean a noisy folder in just a few minutes once the scan is done.

Suggested Folder Strategy (So You Do Not Over-Delete)

A practical approach is to scan in small batches first, not your entire drive in one go.

Good first targets:

Downloads folder.
Export/output folders from video or design tools.
Temporary transfer folders (phone backups, shared zip extracts, etc.).

Folders you should handle more carefully:

System directories.
App installation folders.
Active development dependencies unless you are sure what is duplicate and safe.

This method keeps your cleanup focused and reduces accidental deletion risk.

Screenshots

Main Duplicate Vision interface with header, duplicate list, and preview panel

Safety Notes Before Deleting

Even with Recycle Bin protection, a careful routine helps:

Preview at least one file from each large duplicate group.
Check folder paths before final delete.
Start with low-risk folders first.
For mission-critical data, keep a backup snapshot before mass cleanup.

These habits sound simple, but they prevent almost all painful mistakes.

Lightweight Troubleshooting

If scanning feels slow:

Start with a smaller folder segment.
Use the file type filter to restrict the scan to only what you care about.
If you have run this scan before, the hash cache should already make it faster — wait for the first pass to finish.
Avoid scanning active sync folders while they are changing rapidly.
Close heavy disk-usage apps temporarily.

If no duplicates are found but you expected some:

Confirm you selected the correct parent folder.
Check whether the file type filter is excluding the files you expected to find.
Check whether files are actually identical or only similar in name.
Re-run scan after file transfers fully complete.

Power User: Environment Variables

For users who want fine-grained control without touching the UI, several environment variables adjust scanner behavior:

Variable	Values	Effect
`DV_DIAGNOSTICS`	`1` / `true`	Enables detailed per-phase scan logging to the console.
`DV_PARANOID_MODE`	`1` / `true`	Re-enables byte-by-byte file verification after hashing (disabled by default).
`DV_ADAPTIVE_MODE`	`conservative` / `balanced` / `aggressive`	Overrides the mode dropdown at startup.
`DV_DISK_TYPE`	`ssd` / `hdd`	Forces the disk type instead of using auto-detection.

These are optional. The app works correctly without any of them. They exist for debugging, performance testing, or running on unusual storage configurations.

FAQ

Does this app delete files permanently?

No. Selected files are moved to the Windows Recycle Bin.

Can it detect duplicates with different filenames?

Yes. Detection is based on file content, not filename.

Is this suitable for photo libraries?

Yes, especially when your library has repeated exports or copied batches across folders. Use the Images filter to skip non-image files and speed up the scan.

Will it work for non-image files too?

Yes. It checks all file types and shows metadata for files it cannot render as images.

Will repeat scans always take as long as the first one?

No. After the first scan, hashes are cached in a local SQLite database. Repeat scans of unchanged folders are significantly faster because unchanged files are loaded from cache instead of re-read from disk.

My NAS or external HDD scan feels slow. What should I do?

Set the mode to Conservative and, if the drive is rotational, set DV_DISK_TYPE=hdd. This limits the thread count to avoid seek thrashing, which is the main cause of slowdowns on rotational drives.

Final Thoughts

Duplicate files are one of those problems that stay invisible until your disk is almost full. A good tool should make cleanup feel calm, not risky.

Duplicate Vision is my attempt at that: accurate detection, transparent preview, and safer deletion in a straightforward interface.

If your Downloads, Photos, or project export folders have become chaotic, this app is built for exactly that moment.

If Duplicate Vision saved you some disk space or just a bit of frustration, consider buying me a coffee. It helps me keep the app free, maintained, and improving.

☕ Support CrushEdge on Ko-fi — every cup is genuinely appreciated.