如何检查字符串是否存在于:
- 1个文本文件;
- 大小不超过 10GB;
- 考虑到文件只有一行;
- 文件只包含随机数1到9;
- 使用powershell(因为我认为它会更有效率,虽然我不知道如何用这种语言编程);
我已经批量试过了:
FINDSTR "897516" decimal_output.txt
pause
但正如我所说,我需要更快、更有效的方法来做到这一点。
我还尝试了在 stackoverflow 中找到的这段代码:
$SEL = Select-String -Path C:\Users\fabio\Desktop\CONVERTIDOS\dec_output.txt -Pattern "123456"
if ($SEL -ne $null)
{
echo Contains String
}
else
{
echo Not Contains String
}
但是我得到下面的错误,我不知道这段代码是否最可靠或足够。错误:
Select-String : Tipo de excepção 'System.OutOfMemoryException' accionado. At C:\Users\fabio\Desktop\1.ps1:1 char:8 + $SEL = Select-String -Path C:\Users\fabio\Desktop\CONVERTIDOS\dec_out ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Select-String], OutOfMemoryException + FullyQualifiedErrorId : System.OutOfMemoryException,Microsoft.PowerShell.Commands.SelectStringCommand
最佳答案
这应该可以完成工作:
#################################################################################################################
#
# Searches for a user defined string in the $input_file and counts matches. Works with files of any size.
#
# Adjust source directory and input file name.
#
$source = "C:\adjust\path"
$input_file = "file_name.extension"
#
#
# Define the string you want to search for. Keep quotation marks even if you only search for numbers (otherwise
# $pattern.Length will be 1 and this script will no longer work with files larger than the $split_size)!
#
$pattern = "Enter the string to search for in here"
#
#
# Using Get-Content on an input file with a size of 1GB or more will cause System.OutOfMemoryExceptions,
# therefore a large file gets temporarily split up.
#
$split_size = 100MB
#
#
# Thanks @Bob (https://superuser.com/a/1295082/868077)
#################################################################################################################
Set-Location $source
if (test-path ".\_split") {
while ($overwrite -ne "true" -and $overwrite -ne "false") {
"`n"
$overwrite = Read-Host ' Splitted files already/still exist! Delete and overwrite?'
if ($overwrite -match "y") {
$overwrite = "true"
Remove-Item .\_split -force -recurse
$a = "`n Deleted existing splitted files!"
} elseif ($overwrite -match "n") {
$overwrite = "false"
$a = "`n Continuing with existing splitted files!"
} elseif ($overwrite -match "c") {
exit
} else {
Write-Host "`n Error: Invalid input!`n Type 'y' for 'yes'. Type 'n' for 'no'. Type 'c' for 'cancel'. `n`n`n"
}
}
}
Clear-Host
if ((Get-Item $input_file).Length -gt $split_size) {
while ($delete -ne "true" -and $delete -ne "false") {
"`n"
$delete = Read-Host ' Delete splitted files afterwards?'
if ($delete -match "y") {
$delete = "true"
$b = "`n Splitted files will be deleted afterwards!"
} elseif ($delete -match "n") {
$delete = "false"
$b = "`n Splitted files will not be deleted afterwards!"
} elseif ($delete -match "c") {
exit
} else {
Write-Host "`n Error: Invalid input!`n Type 'y' for 'yes'. Type 'n' for 'no'. Type 'c' for 'cancel'. `n`n`n"
}
}
Clear-Host
$a
$b
Write-Host `n This may take some time!
if ($overwrite -ne "false") {
New-Item -ItemType directory -Path ".\_split" >$null 2>&1
[Environment]::CurrentDirectory = Get-Location
$bytes = New-Object byte[] 4096
$in_file = [System.IO.File]::OpenRead($input_file)
$file_count = 0
$finished = $false
while (!$finished) {
$file_count++
$bytes_to_read = $split_size
$out_file = New-Object System.IO.FileStream ".\_split\_split_$file_count.splt",CreateNew,Write,None
while ($bytes_to_read) {
$bytes_read = $in_file.Read($bytes, 0, [Math]::Min($bytes.Length, $bytes_to_read))
if (!$bytes_read) {
$finished = $true
break
}
$bytes_to_read -= $bytes_read
$out_file.Write($bytes, 0, $bytes_read)
}
$out_file.Dispose()
}
$in_file.Dispose()
}
$i++
while (Test-Path ".\_split\_split_$i.splt") {
$cur_file = (Get-Content ".\_split\_split_$i.splt")
$temp_count = ([regex]::Matches($cur_file, "$pattern")).Count
$match_count += $temp_count
$n = $i - 1
if (Test-Path ".\_split\_split_$n.splt") {
if ($cur_file.Length -ge $pattern.Length) {
$file_transition = $prev_file.Substring($prev_file.Length - ($pattern.Length - 1)) + $cur_file.Substring(0,($pattern.Length - 1))
} else {
$file_transition = $prev_file.Substring($prev_file.Length - ($pattern.Length - 1)) + $cur_file
}
$temp_count = ([regex]::Matches($file_transition, "$pattern")).Count
$match_count += $temp_count
}
$prev_file = $cur_file
$i++
}
} else {
$a
$match_count = ([regex]::Matches($input_file, "$pattern")).Count
}
if ($delete -eq "true") {
Remove-Item ".\_split" -Force -Recurse
}
if ($match_count -ge 1) {
Write-Host "`n`n String '$pattern' found:`n`n $match_count matches!"
} else {
Write-Host "`n`n String '$pattern' not found!"
}
Write-Host `n`n`n`n`n
Pause
这会将一个大文件拆分为多个较小的文件,在它们中搜索 $pattern
并计算匹配项(考虑文件转换)。
它还允许您在之后删除或保留拆分后的文件,这样您就可以重复使用它们,而不必在每次运行此脚本时都拆分大文件。
关于string - 在大文件中高效地搜索字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48974497/