powershell - 使用 Powershell 对文本文件进行计数和过滤

我有一组文本文件，每个文件在不同的行上包含几千个数字。我想使用命令行或 Powershell 输出一个文件，总结每个文件中包含的数据，如下所示，例如计数.txt:

test.txt:          <--Filename
 (a) total: 4325    <-- Total number of lines in the file
 (b) isbn: 2        <-- Count of numbers that don't start with 3618*
 (c) duplicates: 13 <-- (a-b) - (Count of unique numbers that start with 3618*)

以上内容将附加到目录中每个文件的 Counts.txt 中。

到目前为止，我已成功使用以下方法将每个文件的总行数添加到 Counts.txt:

@echo off
setlocal enabledelayedexpansion

set cnt=0
set "out=Counts.txt"
if exist %out% del /q %out%
for /f %%a in ('dir /b /a-d') do (
    for /f %%b in ('type "%%a"^|find /v /c ""') do (
      set /a cnt=%%b & >>%out% echo(%%~nxa: "-total: %%b") 
    )
)

哪些输出:

test.txt: -total: 9

如何获取不以 3618* 开头的数字计数以及 (c) 点的计算结果？

一些有关格式设置的帮助也将不胜感激:)

最佳答案

有点不清楚以 3618* 开头的唯一重复数字的计数到底是什么意思，因此下面我为此添加了两个可能的选项，其中一个被注释掉了。您可以选择您需要的数量..

$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
    $data = Get-Content -Path $_.FullName
    $isbn = @($data | Where-Object { $_ -like '97*' }).Count

    # if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
    $dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count

    # if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
    # $dupes = @($data | Where-Object { $_ -like '3618*' }).Count

    # output the data in the format you showed in the question
    @"
$($_.Name)
 a) total: $($data.Count)
 b) isbn: $isbn
 c) duplicates: $dupes

"@
}

接下来，将结果写入文件

$result | Set-Content -Path '.\Counts.txt'

结果是这样的:

numbers1.txt
 a) total: 10
 b) isbn: 2
 c) duplicates: 1

numbers2.txt
 a) total: 9
 b) isbn: 2
 c) duplicates: 0

但就我个人而言，我希望输出为 CSV 文件:

$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
    $data = Get-Content -Path $_.FullName
    $isbn = @($data | Where-Object { $_ -like '97*' }).Count

    # if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
    $dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count

    # if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
    # $dupes = @($data | Where-Object { $_ -like '3618*' }).Count

    # output as PsObject
    [PsCustomObject]@{
        File       = $_.Name
        Total      = $data.Count
        Isbn       = $isbn
        Duplicates = $dupes
    }
}

$result | Export-Csv -Path '.\Counts.csv' -UseCulture -NoTypeInformation

关于powershell - 使用 Powershell 对文本文件进行计数和过滤，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65125084/

powershell - 使用 Powershell 对文本文件进行计数和过滤

上一篇：maven - Jacob dll 中的 EXCEPTION_ACCESS_VIOLATION 在 Jenkins 管道中使用 VM

下一篇：java - 如何在 Netbeans 12 中启用预览功能来运行单个 Java 文件？