PowerShell: Find files within x seconds of each other, possible duplicates

This PowerShell script finds two files last-updated (or created) within x seconds of each other. You can specify the max difference in time with the variable “$maxDifferenceInSeconds”. With a few changes you could make this program work with minutes, hours or days.

In our scenarios, we receive files from a vendor or another application. Sometimes, they send us same or similar data just a few seconds apart. To do a proper analysis, we needed to do a side-by-side compare on a few of these files to see what is really happening in the “real-world”.

This program not only finds the pair of files close in time (writes the names to the console), but also copies the pair of files to a new target directory (which saves a lot of time). Then in that directory, we can simply open our favorite “file compare” utility to compare the data insides the two files to see what is different.

NOTE: One more enhancement that might be needed is to only match files where parts of the names match. This of course depends on if the files have names containing id numbers for example. You might only want to identify/copy files where they have the same id number. Example: Our files names start with something similar to this: “Dallas 1234 04-08-2021” or “Dallas 1234 04-08-2021” or “Austin 4567 04-08-21”. You might want to own find files that start with “Dallas 1234” that are within x seconds apart in creation time.


cls
$fromDir = "C:\Data\Sorted" 
$toDir = "C:\Data\Inspect" 
$maxDifferenceInSeconds = 60 

If (!(test-path $toDir))  # create directory if it doesn't exist 
{
    md $toDir
}

$files = get-childitem $fromDir -recurse | Sort-Object LastWriteTime -Descending

$priorFileLastWriteTime = Get-Date
$priorFileDirectoryName = "" 

foreach ($file in $files) 
{
   #Write-Host "-------------------------------------------------------"
   #write-Host "file=$($file.Name)  dir=$($file.DirectoryName)" 

   # only compare files in same sub-directory 
   if ($file.DirectoryName -eq $priorFileDirectoryName) 
   {
      $diffTimeSpan = $file.LastWriteTime - $priorFileLastWriteTime 
      if ($diffTimeSpan.TotalSeconds*-1 -le $maxDifferenceInSeconds) 
      {
         Write-Host "$($diffTimeSpan) between $($file.LastWriteTime) - $($priorFileLastWriteTime) " 
         write-Host "file=$($file.FullName)  and file=$priorFileName)" 
         Write-Host ""

         # Copy the current file 
         $toFile = $toDir + "\" + $file.Name 
         Copy-Item $file.FullName $toFile 

         # Copy the possibly duplicate file 
         $toFile = $toDir + "\" + $priorFileName
         Copy-Item $priorFileFullName $toFile 
      }
   }
   
   $priorFileLastWriteTime  = $file.LastWriteTime
   $priorFileDirectoryName = $file.DirectoryName 
   $priorFileFullName = $file.FullName 
   $priorFileName = $file.Name 

}

Leave a Reply