powershell - 在powershell中比较哈希并删除具有相同哈希的文件不起作用

我正在编写一个脚本，用于识别路径中所有文件的哈希值(并且递归地)。这没关系。

当我确定哪些哈希值相同后，我想将它们保存到一个数组中，以便稍后我可以删除这些具有相同哈希值的文件(如果我愿意)，或者只是打印重复的文件时，我的问题就出现了文件。我整个下午和晚上都在想办法怎么做。我现在的代码:

Write-Host "Write a path: "
$UserInput=Read-Host
Get-ChildItem -Path $UserInput -Recurse

#Get-FileHash cmdlet to get the hashes
$files = Get-ChildItem -Path $UserInput -Recurse | where { !$_.PSIsContainer }
$files | % {(Get-FileHash -Path $_.FullName -Algorithm MD5)}



#Creating an array for all the values and an array for the duplicates
$originals=@()
$copies=@()

 #grouping the hashes that are duplicated cmdlet Group-Object:
$Duplicates = Get-ChildItem -Path $UserInput -Recurse -File |Group {($_|Get-FileHash).Hash} |Where Count -gt 1
foreach($FileGroup in $Duplicates)
{
    Write-Host "These files share hash : $($FileGroup.Name)"
    $FileGroup.Group.FullName |Write-Host
    $copies+=$Duplicates

}

所以最后一部分“$copies+=$Duplicates”无法正常工作。

一开始我想将第一个文件保存在“原始”数组中。如果第二个具有相同的哈希值，则将第二个保存在“副本”数组中。但我不确定在获取哈希值时是否可以在脚本的第一部分中执行此操作。

之后，第二个数组将包含重复项，因此很容易从计算机中删除它们。

最佳答案

我认为你应该过滤这些项目。我做到了，我有一个仅包含一项重复文件的列表和一个包含所有重复文件的列表。

You can use the SHA1 algorithm instead of MD5

SHA1 is much more faster than the MD5 algorithm

$fileHashes = Get-ChildItem -Path $myFilePath -Recurse -File | Get-Filehash -Algorithm SHA1
$duplicates = $fileHashes | Group hash | ? {$_.count -gt 1} | % {$_.Group} 

$uniqueItems = @{}
$doubledItems = @()

foreach($item in $duplicates) {
  
  if(-not $uniqueItems.ContainsKey($item.Hash)){
    $uniqueItems.Add($item.Hash,$item)
  }else{
    $doubledItems += $item
  }
}

# all duplicates files
$doubledItems

# Remove the duplicate files
# $doubledItems | % {Remove-Item $_.path} -Verbose

# one of the duplicate files
$uniqueItems

设置搜索根文件夹

$myFilePath = ''

关于powershell - 在powershell中比较哈希并删除具有相同哈希的文件不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44358602/

powershell - 在powershell中比较哈希并删除具有相同哈希的文件不起作用

上一篇：ruby-on-rails - Searchkick/Rails 使用 Mongoid 的工作示例

下一篇：mongodb - MongoDB + ElasticSearch数据模型