perl - 需要帮助迭代特定格式的文件

标签 perl parsing csv

我正在尝试解析特定格式的 txt 文件并将其转换为 CSV 文件。 但是我遇到了两个问题:

  1. 我需要跳过分隔每个条目的标题(4 行,第一行以\n 开头)
  2. 它只读取最后一个条目。我不确定我做错了什么,所以它读取文本文件中的所有条目。

我的代码:

my $grammar = qr!
        ( ?(DEFINE)
           (?<Identifier> [^=\n]+ )
           (?<Statement>
               (?: # Begin alternation
                   " #Opening quotes
                   [^"]+? # Any non-quotes (including a new line)
                   " # Closing quotes
                  | [^\n]+ # Or a single line
               )   # End alternation
            )

       )

    !x;

    my $file = do { local $/; <> }; #Slurp file named on command line
    my %columns;
    while( $file =~
       m{ ((?&Identifier))[\t ]*=[ \t]*((?&Statement)) $grammar}xgc )
    {
       my ($header,$value) = ($1,$2);

           # Remove leading spaces and quote variable if it contains commas:
       for($header,$value) { s/^\s+//mg; /,/ and s/^|$/"/g }

           # Substitute \n with \\n to make multi-line values single-line:
       for($value) { chomp; s/\n/\\n/g }

       $columns{$header}=$value
    }

    print join "," => sort keys %columns; # Print column headers
    print "\n";
    print join "," => map { $columns{$_} } sort keys %columns; # Column content
    print "\n";

输入文件如下所示:

OPERATION_CONTEXT server:.oc_name alarm_object 1
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:44 PM All Attributes

                             Identifier = 1
                                  State = Outstanding
                         Problem Status = Not-Handled
                  Clearance Report Flag = False
                        Escalated Alarm = False
                     Creation Timestamp = Thu, Jan 16, 2014 10:21:17 PM
                         Managed Object = NETACT server:.NETACT51 BSC 716499 BCF 123
                        Target Entities = { NETACT server:.NETACT51 BSC 716499 BCF 123 }
                             Alarm Type = EnvironmentalAlarm
                             Event Time = Thu, Jan 16, 2014 10:17:14 PM
                         Probable Cause = Indeterminate
                      Specific Problems = { 7409 }
                Notification Identifier = 2433009629
                                 Domain = Domain server:.netact51_dom
                           Alarm Origin = IncomingAlarm
                     Perceived Severity = Critical
                        Additional Text = "ALARMA CRITICA SISTEMA DAS 1900
                                          #S#10497409      ***                                       ZONA TECNICA SANTI
                                          PLMN-PLMN/BSC-716499/BCF-123

                                          SC_logical_name:9344;"
                      Original Severity = Critical
                    Original Event Time = Thu, Jan 16, 2014 10:17:14 PM
                            Outage Flag = False
                    Problem Occurrences = 1 Problems
               GPP3 Problem Occurrences = 0 Problems
           Critical Problem Occurrences = 1 Problems
              Major Problem Occurrences = 0 Problems
              Minor Problem Occurrences = 0 Problems
            Warning Problem Occurrences = 0 Problems
      Indeterminate Problem Occurrences = 0 Problems
              Clear Problem Occurrences = 0 Problems
                               SA Total = 0 Alarms
                                 Comuna = "HUECHURABA"
                             CatCliente = "CAV"
                               Nemonico = "BSMT6_PZANF3"

OPERATION_CONTEXT server:.oc_name alarm_object 2
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:44 PM All Attributes

                             Identifier = 2
                                  State = Outstanding
                         Problem Status = Not-Handled
                  Clearance Report Flag = False
                        Escalated Alarm = False
                     Creation Timestamp = Thu, Jan 16, 2014 10:14:03 PM
                   Clearance Time Stamp = Thu, Jan 16, 2014 10:29:08 PM
                         Managed Object = NETACT server:.NETACT51 BSC 206259 BCF 103
                        Target Entities = { NETACT server:.NETACT51 BSC 206259 BCF 103 }
                             Alarm Type = EnvironmentalAlarm
                             Event Time = Thu, Jan 16, 2014 10:29:37 PM
                         Probable Cause = Indeterminate
                      Specific Problems = { 7409 }
                Notification Identifier = 3780327614
                                 Domain = Domain server:.netact51_dom
                           Alarm Origin = IncomingAlarm
                     Perceived Severity = Critical
                        Additional Text = "ALARMA CRITICA SISTEMA DAS 1900
                                          #S#10497409      ***                                       ZONA TECNICA CENTR
                                          Merval                           BSC VLP7
                                          PLMN-PLMN/BSC-206259/BCF-103
                                          ALARMA CRITICA SISTEMA DAS 1900

                                          SC_logical_name:94681;"
                      Original Severity = Critical
                    Original Event Time = Thu, Jan 16, 2014 10:10:01 PM
                            Outage Flag = False
                    Problem Occurrences = 4 Problems
               GPP3 Problem Occurrences = 0 Problems
           Critical Problem Occurrences = 4 Problems
              Major Problem Occurrences = 0 Problems
              Minor Problem Occurrences = 0 Problems
            Warning Problem Occurrences = 0 Problems
      Indeterminate Problem Occurrences = 0 Problems
              Clear Problem Occurrences = 3 Problems
                               SA Total = 6 Alarms
                                 Comuna = "VINA DEL MAR"
                             CatCliente = "CAV"
                               Nemonico = "BVLP7_MVALF9"

OPERATION_CONTEXT server:.oc_name alarm_object 3
On director: server:.temip.prd1149_director
AT Thu, Jan 16, 2014 10:33:45 PM All Attributes

                             Identifier = 3
                                  State = Outstanding
                         Problem Status = Not-Handled
                  Clearance Report Flag = False
                        Escalated Alarm = False
                     Creation Timestamp = Thu, Jan 16, 2014 09:41:59 PM
                         Managed Object = NETACT server:.NETACT51 BSC 938189 BCF 61
                        Target Entities = { NETACT server:.NETACT51 BSC 938189 BCF 61 }
                             Alarm Type = EnvironmentalAlarm
                             Event Time = Thu, Jan 16, 2014 09:37:58 PM
                         Probable Cause = Indeterminate
                      Specific Problems = { 7405 }
                Notification Identifier = 1757596347
                                 Domain = Domain server:.netact51_dom
                           Alarm Origin = IncomingAlarm
                     Perceived Severity = Major
                        Additional Text = "NUSS FAILURE, RECTIFIER_1 ALARM
                                          #S#10497405      **                                        ZONA TECNICA CENTR
                                          Pelluhue Playa
                                          PLMN-PLMN/BSC-938189/BCF-61

                                          SC_logical_name:9679;"
                      Original Severity = Major
                    Original Event Time = Thu, Jan 16, 2014 09:37:58 PM
                            Outage Flag = False
                    Problem Occurrences = 1 Problems
               GPP3 Problem Occurrences = 0 Problems
           Critical Problem Occurrences = 0 Problems
              Major Problem Occurrences = 1 Problems
              Minor Problem Occurrences = 0 Problems
            Warning Problem Occurrences = 0 Problems
      Indeterminate Problem Occurrences = 0 Problems
              Clear Problem Occurrences = 0 Problems
                               SA Total = 0 Alarms
                                 Comuna = "PELLUHUE"
                             CatCliente = "UNIC_SITE"
                               Nemonico = "BTAL2_PYUEF6"

预先非常感谢您能给我的任何帮助!

最佳答案

以下内容不涉及您的脚本,但提供了逐行解析方法:

use strict;
use warnings;

my ( $showHeader, $lastID, @header, @columns ) = ( 1, '' );

while (<>) {
    if ( my ( $identifier, $statement ) = /^\s+(\S[^=]+)\s+=\s+(.+)/ ) {

        if (    $identifier eq 'Managed Object'
            and $lastID ne 'Clearance Time Stamp' )
        {
            push @header, 'Clearance Time Stamp' if $showHeader;
            push @columns, '';
        }

        if ( $identifier eq 'Additional Text' ) {
            while (<>) {
                my ($additional) = /^\s+(\S.+)/ or next;
                $statement .= $additional;
                last if $additional =~ /SC_logical_name/;
            }
            $statement =~ s/\s+/ /g;
        }

        push @header, $identifier if $showHeader;
        push @columns, $statement;

        if ( $identifier eq 'Nemonico' ) {
            if ($showHeader) {
                print +( join ',', @header ), "\n";
                $showHeader = 0;
            }

            print +( join ',', map { $_ = qq/"$_"/ if /,/ and !/^"/; $_ } @columns ), "\n";
            undef @columns;
        }
        $lastID = $identifier;
    }
}

用法:perl script.pl inFile [>outFile.csv]

最后一个可选参数将输出定向到文件。

字段附加文本中的多个空格将替换为单个空格。

希望这有帮助!

关于perl - 需要帮助迭代特定格式的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21176428/

相关文章:

perl - 如何检查是否导入了所有 Perl 模块?

perl - 有没有办法在perl中调用存储在字符串中的机器代码函数?

javascript - 使用 Jquery 通过 XML 解析选择第二个元素

html - 使用 XSLT 选择包含 HTML 标签的 n 个词的摘要

javascript - 如何从 javascript 字符串数组中解析 csv 数据以在 d3 图中使用

perl - 如何从 Perl 函数调用返回多个值?

perl - 在 Perl 中,如何从子例程返回绑定(bind)的哈希?

java - 无法使用 HTMLParser 获取网页的所有内容

php - 使用 mysqldump 将表数据导出到 csv 文件

python - 在 Python 中打开一个 csv.gz 文件并打印前 100 行