powershell - 解析文本文件并保存为 .csv

标签 powershell csv parsing

我有一个如下所示的文本 (.txt) 文件:

Person    Person Name     Person   Approval     Supervisor Payroll Name    Application  Supplier Start Date  End Date Archived
Type                      Number   Status       Name                       Name


Agency    D'Cunha, Yionue 123456   NOT ENTERED  Power,                     Projects    CONTRACT
Contractor                                      Mehash                                 SUPPLIER_1
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Vughila,        132456   WORKING      Miro,      Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Proshont                              Profal     Monthly
                                                                                                    10-DEC-16  16-DEC-16   No
Employee  Diiri, Maaor    113456   NOT ENTERED  Kargannkir,Company-abcde INPayroll
                                                Bivnath    Monthly
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Kimit, Gongobhar111111   WORKING      Chondorkor,Company-abcde INProjects              10-DEC-16  16-DEC-16   No
                                                Avissku    Monthly
Employee  Kalvornu,       110077   WORKING      Kindipur,  Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Churali                               Barinakir  Monthly
Agency    Dhilorii,       100009   NOT ENTERED  Nook,                      Projects    CONTRACT
ContractorBohishik                              Lurukont                               SUPPLIER_2

I get this file from a report generated by a software. I'd like to parse the file and export the data to CSV. I tried this, but that didn't help because the structure of my data is so different.

Then I tried this:

$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"  

$data = $input[1..($input.Length - 1)]

$maxLength = 0

$objects = foreach ($record in $data) {
    $split = $record -split "\s{2,}|\t+"
    if ($split.Length -gt $maxLength) {
        $maxLength = $split.Length
    }
    $props = @{}
    for ($i=0; $i -lt $split.Length; $i++) {
        $props.Add([String]($i+1), $split[$i])
    }
    New-Object -TypeName PSObject -Property $props
}

$headers = [String[]](1..$maxLength)

$objects | 
    Select-Object $headers | 
    Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"

但这弄乱了每一行的第二行。问题是在原始文本文件中,每隔一行也是第一行的一部分。在某些情况下,甚至第三行也是第一行数据的一部分。

如果我可以提供任何信息来更好地表达我的问题,请告诉我。


在@Ansgar 发表评论后我尝试了这个:

# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String

# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1

$parsedData = foreach ($record in $data) {
    $prop = @{}
    $record -split '\n' | ForEach-Object {
        $prop['PersonType'] += $_.Substring(0, 10).Trim()
        $prop['PersonName'] += $_.Substring(10, 16).Trim()
        $prop['PersonNumber'] += $_.Substring(26, 9).Trim()
        $prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
        $prop['Supervisor'] += $_.Substring(48, 11).Trim()
        $prop['PayrollName'] += $_.Substring(59, 16).Trim()
        $prop['ApplicationName'] += $_.Substring(75, 13).Trim()
        $prop['Supplier'] += $_.Substring(88, 9).Trim()
        $prop['StartDate'] += $_.Substring(97, 12).Trim()
        $prop['EndDate'] += $_.Substring(109, 9).Trim()
        $prop['Archived'] += $_.Substring(118, 8).Trim()
    }

    New-Object -Type PSObject -Property $prev
}

$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType

但现在我在我的目标文件夹中得到了一个空白的输出 CSV 文件。我在某处遗漏了什么吗?

最佳答案

我有一个解决方案,但是......
它使用两个拆分,第一个拆分为 (Person|Agency|Employee)
拆分记录(存在需要 if 的缺陷),
第二个在换行符处拆分,然后解析偏移量+长度。
由于样本数据不一致,这也不完美。

$InFile = 'Q:\Test\2016-12\19\41225200.txt'
$OutFile= 'C:\path\to\output.txt'

$Delimiter = '(Person|Agency|Employee)'
#'$Escaped   = [regex]::Escape($Delimiter)
$Split     = "(?!^)(?=$Delimiter)"

$parsedData = (Get-Content $InFile -Raw) -split $Split | 
    ForEach-Object {
        $prop = @{}
        If ($_.Length -ge 30 ) {
            ForEach ($Line in $_.split("`n")) {
                $Line+=" "*130
                $prop['PersonType']      += $Line.Substring( 0, 10).Trim()
                $prop['PersonName']      += $Line.Substring(10, 16).Trim()
                $prop['PersonNumber']    += $Line.Substring(26,  9).Trim()
                $prop['ApprovalStatus']  += $Line.Substring(35, 13).Trim()
                $prop['Supervisor']      += $Line.Substring(48, 11).Trim()
                $prop['PayrollName']     += $Line.Substring(59, 16).Trim()
                $prop['ApplicationName'] += $Line.Substring(75, 12).Trim()
                $prop['Supplier']        += $Line.Substring(87, 10).Trim()
                $prop['StartDate']       += $Line.Substring(97,  9).Trim()
                $prop['EndDate']         += $Line.Substring(108, 9).Trim()
                $prop['Archived']        += $Line.Substring(117, 8).Trim()
            }
        }
        New-Object -TypeName PSObject -Property $prop
}
$parsedData

输出

Supervisor      : ApplicatioName
ApplicationName : t Date End DName
Archived        :
PersonType      : Person   AType
PersonName      : pproval     Supe
Supplier        : ate Archiv
StartDate       : ed
ApprovalStatus  : yroll NameStatus
PayrollName     : n Supplier  Star
PersonNumber    : rvisor PaNumber
EndDate         :


Supervisor      : Power,Mehash
ApplicationName : Projects
Archived        : No
PersonType      : AgencyContractor
PersonName      : D'Cunha, Yionue
Supplier        : CONTRACTSUPPLIER_1
StartDate       : 10-DEC-16
ApprovalStatus  : NOT ENTERED
PayrollName     :
PersonNumber    : 123456
EndDate         : 16-DEC-16


Supervisor      : Miro,Profal
ApplicationName : Payroll
Archived        : NoNo
PersonType      : Employee
PersonName      : Vughila,Proshont
Supplier        :
StartDate       : 10-DEC-1610-DEC-16
ApprovalStatus  : WORKING
PayrollName     : Company-abcde INMonthly
PersonNumber    : 132456
EndDate         : 16-DEC-1616-DEC-16

我对 export-csv 的尝试也是空的。

关于powershell - 解析文本文件并保存为 .csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41225200/

相关文章:

windows - 如何写 Yes 命令作为这个问题的答案?谢谢

sql-server - 等待 AWS RDS native sql 备份完成

python - 编写列表长度不同的 csv

ruby-on-rails - Ruby:如何在 Ruby 中读取包含两个 header 的 CSV 文件?

iOS Swift - Google Place API(Web)以奇怪的格式返回 JSON

powershell - 更新 Office 365 用户的某些 "My Contacts"的电子邮件地址

powershell - 如何为要从提示符调用的 PowerShell 脚本设置别名?

csv - 使用 dask 将大于内存的 csv 文件存储到 hdf5 文件

parsing - 如何将多个数据对象传递给 Golang 中的 HTML 模板

java - 当我使用 json.org 解析时出现 JSON 错误