我有一个如下所示的文本 (.txt) 文件:
Person Person Name Person Approval Supervisor Payroll Name Application Supplier Start Date End Date Archived Type Number Status Name Name Agency D'Cunha, Yionue 123456 NOT ENTERED Power, Projects CONTRACT Contractor Mehash SUPPLIER_1 10-DEC-16 16-DEC-16 No Employee Vughila, 132456 WORKING Miro, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Proshont Profal Monthly 10-DEC-16 16-DEC-16 No Employee Diiri, Maaor 113456 NOT ENTERED Kargannkir,Company-abcde INPayroll Bivnath Monthly 10-DEC-16 16-DEC-16 No Employee Kimit, Gongobhar111111 WORKING Chondorkor,Company-abcde INProjects 10-DEC-16 16-DEC-16 No Avissku Monthly Employee Kalvornu, 110077 WORKING Kindipur, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Churali Barinakir Monthly Agency Dhilorii, 100009 NOT ENTERED Nook, Projects CONTRACT ContractorBohishik Lurukont SUPPLIER_2
I get this file from a report generated by a software. I'd like to parse the file and export the data to CSV. I tried this, but that didn't help because the structure of my data is so different.
Then I tried this:
$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"
$data = $input[1..($input.Length - 1)]
$maxLength = 0
$objects = foreach ($record in $data) {
$split = $record -split "\s{2,}|\t+"
if ($split.Length -gt $maxLength) {
$maxLength = $split.Length
}
$props = @{}
for ($i=0; $i -lt $split.Length; $i++) {
$props.Add([String]($i+1), $split[$i])
}
New-Object -TypeName PSObject -Property $props
}
$headers = [String[]](1..$maxLength)
$objects |
Select-Object $headers |
Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"
但这弄乱了每一行的第二行。问题是在原始文本文件中,每隔一行也是第一行的一部分。在某些情况下,甚至第三行也是第一行数据的一部分。
如果我可以提供任何信息来更好地表达我的问题,请告诉我。
在@Ansgar 发表评论后我尝试了这个:
# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String
# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1
$parsedData = foreach ($record in $data) {
$prop = @{}
$record -split '\n' | ForEach-Object {
$prop['PersonType'] += $_.Substring(0, 10).Trim()
$prop['PersonName'] += $_.Substring(10, 16).Trim()
$prop['PersonNumber'] += $_.Substring(26, 9).Trim()
$prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
$prop['Supervisor'] += $_.Substring(48, 11).Trim()
$prop['PayrollName'] += $_.Substring(59, 16).Trim()
$prop['ApplicationName'] += $_.Substring(75, 13).Trim()
$prop['Supplier'] += $_.Substring(88, 9).Trim()
$prop['StartDate'] += $_.Substring(97, 12).Trim()
$prop['EndDate'] += $_.Substring(109, 9).Trim()
$prop['Archived'] += $_.Substring(118, 8).Trim()
}
New-Object -Type PSObject -Property $prev
}
$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType
但现在我在我的目标文件夹中得到了一个空白的输出 CSV 文件。我在某处遗漏了什么吗?
最佳答案
我有一个解决方案,但是......
它使用两个拆分,第一个拆分为 (Person|Agency|Employee)
拆分记录(存在需要 if 的缺陷),
第二个在换行符处拆分,然后解析偏移量+长度。
由于样本数据不一致,这也不完美。
$InFile = 'Q:\Test\2016-12\19\41225200.txt'
$OutFile= 'C:\path\to\output.txt'
$Delimiter = '(Person|Agency|Employee)'
#'$Escaped = [regex]::Escape($Delimiter)
$Split = "(?!^)(?=$Delimiter)"
$parsedData = (Get-Content $InFile -Raw) -split $Split |
ForEach-Object {
$prop = @{}
If ($_.Length -ge 30 ) {
ForEach ($Line in $_.split("`n")) {
$Line+=" "*130
$prop['PersonType'] += $Line.Substring( 0, 10).Trim()
$prop['PersonName'] += $Line.Substring(10, 16).Trim()
$prop['PersonNumber'] += $Line.Substring(26, 9).Trim()
$prop['ApprovalStatus'] += $Line.Substring(35, 13).Trim()
$prop['Supervisor'] += $Line.Substring(48, 11).Trim()
$prop['PayrollName'] += $Line.Substring(59, 16).Trim()
$prop['ApplicationName'] += $Line.Substring(75, 12).Trim()
$prop['Supplier'] += $Line.Substring(87, 10).Trim()
$prop['StartDate'] += $Line.Substring(97, 9).Trim()
$prop['EndDate'] += $Line.Substring(108, 9).Trim()
$prop['Archived'] += $Line.Substring(117, 8).Trim()
}
}
New-Object -TypeName PSObject -Property $prop
}
$parsedData
输出
Supervisor : ApplicatioName
ApplicationName : t Date End DName
Archived :
PersonType : Person AType
PersonName : pproval Supe
Supplier : ate Archiv
StartDate : ed
ApprovalStatus : yroll NameStatus
PayrollName : n Supplier Star
PersonNumber : rvisor PaNumber
EndDate :
Supervisor : Power,Mehash
ApplicationName : Projects
Archived : No
PersonType : AgencyContractor
PersonName : D'Cunha, Yionue
Supplier : CONTRACTSUPPLIER_1
StartDate : 10-DEC-16
ApprovalStatus : NOT ENTERED
PayrollName :
PersonNumber : 123456
EndDate : 16-DEC-16
Supervisor : Miro,Profal
ApplicationName : Payroll
Archived : NoNo
PersonType : Employee
PersonName : Vughila,Proshont
Supplier :
StartDate : 10-DEC-1610-DEC-16
ApprovalStatus : WORKING
PayrollName : Company-abcde INMonthly
PersonNumber : 132456
EndDate : 16-DEC-1616-DEC-16
我对 export-csv 的尝试也是空的。
关于powershell - 解析文本文件并保存为 .csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41225200/