我正在尝试解析特定格式的 txt 文件并将其转换为 CSV 文件。 但是我遇到了两个问题:
- 我需要跳过分隔每个条目的标题(4 行,第一行以\n 开头)
- 它只读取最后一个条目。我不确定我做错了什么,所以它读取文本文件中的所有条目。
我的代码:
my $grammar = qr!
( ?(DEFINE)
(?<Identifier> [^=\n]+ )
(?<Statement>
(?: # Begin alternation
" #Opening quotes
[^"]+? # Any non-quotes (including a new line)
" # Closing quotes
| [^\n]+ # Or a single line
) # End alternation
)
)
!x;
my $file = do { local $/; <> }; #Slurp file named on command line
my %columns;
while( $file =~
m{ ((?&Identifier))[\t ]*=[ \t]*((?&Statement)) $grammar}xgc )
{
my ($header,$value) = ($1,$2);
# Remove leading spaces and quote variable if it contains commas:
for($header,$value) { s/^\s+//mg; /,/ and s/^|$/"/g }
# Substitute \n with \\n to make multi-line values single-line:
for($value) { chomp; s/\n/\\n/g }
$columns{$header}=$value
}
print join "," => sort keys %columns; # Print column headers
print "\n";
print join "," => map { $columns{$_} } sort keys %columns; # Column content
print "\n";
输入文件如下所示:
OPERATION_CONTEXT server:.oc_name alarm_object 1
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:44 PM All Attributes
Identifier = 1
State = Outstanding
Problem Status = Not-Handled
Clearance Report Flag = False
Escalated Alarm = False
Creation Timestamp = Thu, Jan 16, 2014 10:21:17 PM
Managed Object = NETACT server:.NETACT51 BSC 716499 BCF 123
Target Entities = { NETACT server:.NETACT51 BSC 716499 BCF 123 }
Alarm Type = EnvironmentalAlarm
Event Time = Thu, Jan 16, 2014 10:17:14 PM
Probable Cause = Indeterminate
Specific Problems = { 7409 }
Notification Identifier = 2433009629
Domain = Domain server:.netact51_dom
Alarm Origin = IncomingAlarm
Perceived Severity = Critical
Additional Text = "ALARMA CRITICA SISTEMA DAS 1900
#S#10497409 *** ZONA TECNICA SANTI
PLMN-PLMN/BSC-716499/BCF-123
SC_logical_name:9344;"
Original Severity = Critical
Original Event Time = Thu, Jan 16, 2014 10:17:14 PM
Outage Flag = False
Problem Occurrences = 1 Problems
GPP3 Problem Occurrences = 0 Problems
Critical Problem Occurrences = 1 Problems
Major Problem Occurrences = 0 Problems
Minor Problem Occurrences = 0 Problems
Warning Problem Occurrences = 0 Problems
Indeterminate Problem Occurrences = 0 Problems
Clear Problem Occurrences = 0 Problems
SA Total = 0 Alarms
Comuna = "HUECHURABA"
CatCliente = "CAV"
Nemonico = "BSMT6_PZANF3"
OPERATION_CONTEXT server:.oc_name alarm_object 2
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:44 PM All Attributes
Identifier = 2
State = Outstanding
Problem Status = Not-Handled
Clearance Report Flag = False
Escalated Alarm = False
Creation Timestamp = Thu, Jan 16, 2014 10:14:03 PM
Clearance Time Stamp = Thu, Jan 16, 2014 10:29:08 PM
Managed Object = NETACT server:.NETACT51 BSC 206259 BCF 103
Target Entities = { NETACT server:.NETACT51 BSC 206259 BCF 103 }
Alarm Type = EnvironmentalAlarm
Event Time = Thu, Jan 16, 2014 10:29:37 PM
Probable Cause = Indeterminate
Specific Problems = { 7409 }
Notification Identifier = 3780327614
Domain = Domain server:.netact51_dom
Alarm Origin = IncomingAlarm
Perceived Severity = Critical
Additional Text = "ALARMA CRITICA SISTEMA DAS 1900
#S#10497409 *** ZONA TECNICA CENTR
Merval BSC VLP7
PLMN-PLMN/BSC-206259/BCF-103
ALARMA CRITICA SISTEMA DAS 1900
SC_logical_name:94681;"
Original Severity = Critical
Original Event Time = Thu, Jan 16, 2014 10:10:01 PM
Outage Flag = False
Problem Occurrences = 4 Problems
GPP3 Problem Occurrences = 0 Problems
Critical Problem Occurrences = 4 Problems
Major Problem Occurrences = 0 Problems
Minor Problem Occurrences = 0 Problems
Warning Problem Occurrences = 0 Problems
Indeterminate Problem Occurrences = 0 Problems
Clear Problem Occurrences = 3 Problems
SA Total = 6 Alarms
Comuna = "VINA DEL MAR"
CatCliente = "CAV"
Nemonico = "BVLP7_MVALF9"
OPERATION_CONTEXT server:.oc_name alarm_object 3
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:45 PM All Attributes
Identifier = 3
State = Outstanding
Problem Status = Not-Handled
Clearance Report Flag = False
Escalated Alarm = False
Creation Timestamp = Thu, Jan 16, 2014 09:41:59 PM
Managed Object = NETACT server:.NETACT51 BSC 938189 BCF 61
Target Entities = { NETACT server:.NETACT51 BSC 938189 BCF 61 }
Alarm Type = EnvironmentalAlarm
Event Time = Thu, Jan 16, 2014 09:37:58 PM
Probable Cause = Indeterminate
Specific Problems = { 7405 }
Notification Identifier = 1757596347
Domain = Domain server:.netact51_dom
Alarm Origin = IncomingAlarm
Perceived Severity = Major
Additional Text = "NUSS FAILURE, RECTIFIER_1 ALARM
#S#10497405 ** ZONA TECNICA CENTR
Pelluhue Playa
PLMN-PLMN/BSC-938189/BCF-61
SC_logical_name:9679;"
Original Severity = Major
Original Event Time = Thu, Jan 16, 2014 09:37:58 PM
Outage Flag = False
Problem Occurrences = 1 Problems
GPP3 Problem Occurrences = 0 Problems
Critical Problem Occurrences = 0 Problems
Major Problem Occurrences = 1 Problems
Minor Problem Occurrences = 0 Problems
Warning Problem Occurrences = 0 Problems
Indeterminate Problem Occurrences = 0 Problems
Clear Problem Occurrences = 0 Problems
SA Total = 0 Alarms
Comuna = "PELLUHUE"
CatCliente = "UNIC_SITE"
Nemonico = "BTAL2_PYUEF6"
预先非常感谢您能给我的任何帮助!
最佳答案
以下内容不涉及您的脚本,但提供了逐行解析方法:
use strict;
use warnings;
my ( $showHeader, $lastID, @header, @columns ) = ( 1, '' );
while (<>) {
if ( my ( $identifier, $statement ) = /^\s+(\S[^=]+)\s+=\s+(.+)/ ) {
if ( $identifier eq 'Managed Object'
and $lastID ne 'Clearance Time Stamp' )
{
push @header, 'Clearance Time Stamp' if $showHeader;
push @columns, '';
}
if ( $identifier eq 'Additional Text' ) {
while (<>) {
my ($additional) = /^\s+(\S.+)/ or next;
$statement .= $additional;
last if $additional =~ /SC_logical_name/;
}
$statement =~ s/\s+/ /g;
}
push @header, $identifier if $showHeader;
push @columns, $statement;
if ( $identifier eq 'Nemonico' ) {
if ($showHeader) {
print +( join ',', @header ), "\n";
$showHeader = 0;
}
print +( join ',', map { $_ = qq/"$_"/ if /,/ and !/^"/; $_ } @columns ), "\n";
undef @columns;
}
$lastID = $identifier;
}
}
用法:perl script.pl inFile [>outFile.csv]
最后一个可选参数将输出定向到文件。
字段附加文本
中的多个空格将替换为单个空格。
希望这有帮助!
关于perl - 需要帮助迭代特定格式的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21176428/