string - 在大文件中高效地搜索字符串

标签 string powershell search

如何检查字符串是否存在于:

  1. 1个文本文件;
  2. 大小不超过 10GB;
  3. 考虑到文件只有一行;
  4. 文件只包含随机数1到9;
  5. 使用powershell(因为我认为它会更有效率,虽然我不知道如何用这种语言编程);

我已经批量试过了:

FINDSTR "897516" decimal_output.txt
pause

但正如我所说,我需要更快、更有效的方法来做到这一点。


我还尝试了在 stackoverflow 中找到的这段代码:

$SEL = Select-String -Path C:\Users\fabio\Desktop\CONVERTIDOS\dec_output.txt -Pattern "123456"

if ($SEL -ne $null)
{
echo Contains String
}
else
{
echo Not Contains String
}

但是我得到下面的错误,我不知道这段代码是否最可靠或足够。错误:

Select-String : Tipo de excepção 'System.OutOfMemoryException' accionado. At C:\Users\fabio\Desktop\1.ps1:1 char:8 + $SEL = Select-String -Path C:\Users\fabio\Desktop\CONVERTIDOS\dec_out ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Select-String], OutOfMemoryException + FullyQualifiedErrorId : System.OutOfMemoryException,Microsoft.PowerShell.Commands.SelectStringCommand

最佳答案

这应该可以完成工作:

#################################################################################################################
#
# Searches for a user defined string in the $input_file and counts matches. Works with files of any size.
#
# Adjust source directory and input file name.
#
$source = "C:\adjust\path"
$input_file = "file_name.extension"
#
#
# Define the string you want to search for. Keep quotation marks even if you only search for numbers (otherwise 
# $pattern.Length will be 1 and this script will no longer work with files larger than the $split_size)!
#
$pattern = "Enter the string to search for in here"
#
#
# Using Get-Content on an input file with a size of 1GB or more will cause System.OutOfMemoryExceptions,
# therefore a large file gets temporarily split up.
#
$split_size = 100MB
#
#
# Thanks @Bob (https://superuser.com/a/1295082/868077)
#################################################################################################################

Set-Location $source


if (test-path ".\_split") {

    while ($overwrite -ne "true" -and $overwrite -ne "false") {

        "`n"
        $overwrite = Read-Host ' Splitted files already/still exist! Delete and overwrite?'

        if ($overwrite -match "y") {

            $overwrite = "true"
            Remove-Item .\_split -force -recurse
            $a = "`n Deleted existing splitted files!"

        } elseif ($overwrite -match "n") {

            $overwrite = "false"
            $a = "`n Continuing with existing splitted files!"

        } elseif ($overwrite -match "c") {

            exit

        } else {

            Write-Host "`n Error: Invalid input!`n Type 'y' for 'yes'. Type 'n' for 'no'. Type 'c' for 'cancel'. `n`n`n"

        }

    }

}

Clear-Host


if ((Get-Item $input_file).Length -gt $split_size) {

    while ($delete -ne "true" -and $delete -ne "false") {

        "`n"
        $delete = Read-Host ' Delete splitted files afterwards?'

        if ($delete -match "y") {

            $delete = "true"
            $b = "`n Splitted files will be deleted afterwards!"

        } elseif ($delete -match "n") {

            $delete = "false"
            $b = "`n Splitted files will not be deleted afterwards!"

        } elseif ($delete -match "c") {

            exit

        } else {

            Write-Host "`n Error: Invalid input!`n Type 'y' for 'yes'. Type 'n' for 'no'. Type 'c' for 'cancel'. `n`n`n"

        }

    }

    Clear-Host

    $a
    $b


    Write-Host `n This may take some time!

    if ($overwrite -ne "false") {

        New-Item -ItemType directory -Path ".\_split" >$null 2>&1
        [Environment]::CurrentDirectory = Get-Location

        $bytes = New-Object byte[] 4096
        $in_file = [System.IO.File]::OpenRead($input_file)
        $file_count = 0
        $finished = $false

        while (!$finished) {

            $file_count++
            $bytes_to_read = $split_size
            $out_file = New-Object System.IO.FileStream ".\_split\_split_$file_count.splt",CreateNew,Write,None

            while ($bytes_to_read) {

                $bytes_read = $in_file.Read($bytes, 0, [Math]::Min($bytes.Length, $bytes_to_read))

                if (!$bytes_read) {

                    $finished = $true
                    break

                }

                $bytes_to_read -= $bytes_read
                $out_file.Write($bytes, 0, $bytes_read)

            }

            $out_file.Dispose()

        }

        $in_file.Dispose()

    }

    $i++

    while (Test-Path ".\_split\_split_$i.splt") {

        $cur_file = (Get-Content ".\_split\_split_$i.splt")
        $temp_count = ([regex]::Matches($cur_file, "$pattern")).Count
        $match_count += $temp_count

        $n = $i - 1

        if (Test-Path ".\_split\_split_$n.splt") {

            if ($cur_file.Length -ge $pattern.Length) {

                $file_transition = $prev_file.Substring($prev_file.Length - ($pattern.Length - 1)) + $cur_file.Substring(0,($pattern.Length - 1))

            } else {

                $file_transition = $prev_file.Substring($prev_file.Length - ($pattern.Length - 1)) + $cur_file

            }

            $temp_count = ([regex]::Matches($file_transition, "$pattern")).Count
            $match_count += $temp_count

        }

        $prev_file = $cur_file
        $i++

    }

} else {

    $a
    $match_count = ([regex]::Matches($input_file, "$pattern")).Count

}


if ($delete -eq "true") {

    Remove-Item ".\_split" -Force -Recurse

}


if ($match_count -ge 1) {

    Write-Host "`n`n String '$pattern' found:`n`n $match_count matches!"

} else {

    Write-Host "`n`n String '$pattern' not found!"

}


Write-Host `n`n`n`n`n

Pause

这会将一个大文件拆分为多个较小的文件,在它们中搜索 $pattern 并计算匹配项(考虑文件转换)。

它还允许您在之后删除或保留拆分后的文件,这样您就可以重复使用它们,而不必在每次运行此脚本时都拆分大文件。

关于string - 在大文件中高效地搜索字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48974497/

相关文章:

java - 寻找单行删除多个子字符串

email - Powershell电子邮件:如何发送具有高度重要性的电子邮件?

powershell - 在 PowerShell 中将二维数组输出到 csv 文件

sql - Powershell:使用动态SQL运行存储的proc不起作用

javascript - 在数组数组中搜索数组的最有效方法

jquery - AJAX 从 Twitter 搜索读取 JSON 数据不起作用

java - 减少 Java 的操作时间

python - 在 Python 中同时处理两个文件

mysql - 区分mysql中大小写不同的两个字符串

string - 我如何转换并附加到 Golang 中的一段字符串,一个来自 go-ping 存储库的网络类型变量?