perl - 在 perl 中处理嵌套分隔符

标签 perl delimiter

use strict;   
use warnings; 

my %result_hash = (); 
my %final_hash  = (); 
Compare_results(); 

foreach my $key (sort keys %result_hash ){ 
  print "$key \n"; 
  print "$result_hash{$key} \n"; 
} 

sub Compare_results 
{ 

  while ( <DATA> ) 
  { 
   my($instance,$values) = split /\:/, $_; 
   $result_hash{$instance} = $values; 

   } 
} 
__DATA__ 
1:7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\ d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2:7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d

输出

1 
7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d 

2 
7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d

我尝试获取每个键的值,并再次尝试将逗号分隔的值与结果哈希分开,如果我在任何值中找到分号,我希望将左右值存储在单独的哈希键中。

如下所示

1.#split the value of result_hash{$key}  again by , and see whether any chunk is seperated by ;
2. #every chunk without ; and value on left  with ; should be stored in   
 @{$final_hash{"eto"}} =   ['7802315095\d\d','7802315098\d\d','7802025002\d\d','7802025003\d\d','7802025004\d\d','7802025005\d\d','7802025006\d\d','7802025007\d\d']  ;
3.#Anything found on the right side of ; has to be stored in  
 @{$final_hash{"pro"}} = ['7802025001\d\d'] ;   

有没有办法可以处理子例程中的所有内容?我可以让代码更简单吗

更新:

我尝试一次分割字符串,但它只是用分号选择值并忽略所有内容

foreach my $key (sort keys %result_hash ){
#   print "$key \n";
#   print "$result_hash{$key} \n";
my ($o,$t) = split(/,|;/, $result_hash{$key});
   print "Left : $o \n";
   print "Left : $t \n";
   #push @{$final_hash{"eto"}}, $o;
   #push @{$final_hash{"pro"}} ,$t;
 }

}

帮助后我更新的代码

sub Compare_results
{   
  open my $fh, '<', 'Data_File.txt' or die $!;
  # split by colon and further split by , and ; if any (done in insert_array)
  my %result_hash = map { chomp; split ':', $_ } <$fh> ; 
  foreach  ( sort { $a <=> $b }  (keys %result_hash) )
  { 
     ($_ < 21) 
        ? insert_array($result_hash{$_}, "west")
        : insert_array($result_hash{$_}, "east");
  } 
}


 sub insert_array()
 {
   my ($val,$key) = @_;
   foreach my $field (split ',', $val)
   {   
     $field =~ s/^\s+|\s+$//g;    # /  turn off editor coloring
     if ($field !~ /;/) {
        push @{ $file_data{"pto"}{$key} }, $field ;
     }
     else {
       my ($left, $right) = split ';', $field;
       push  @{$file_data{"pto"}{$key}}, $left if($left ne '') ;
       push @{$file_data{"ero"}{$key}}, $right if($right ne '')  ;
    }
   }  
  }

谢谢

最佳答案

更新  在末尾添加了两遍正则表达式


就系统地进行,一步步分析字符串。事实上,您需要连续的分割和特定的分离规则,这使得一次完成变得很笨拙。最好有一个清晰的方法,而不是一个怪物声明。

use warnings 'all';
use strict;   
use feature 'say';

my (%result_hash, %final_hash); 

Compare_results(); 

say "$_ => $result_hash{$_}" for sort keys %result_hash;
say '---';
say "$_ => [ @{$final_hash{$_}} ]" for sort keys %final_hash;

sub Compare_results 
{   
    %result_hash = map { chomp; split ':', $_ } <DATA>;

    my (@eto, @pro);
    foreach my $val (values %result_hash)
    {   
        foreach my $field (split ',', $val)
        {   
            if ($field !~ /;/) { push @eto, $field }
            else { 
                my ($left, $right) = split ';', $field;
                push @eto, $left;
                push @pro, $right;
            }
        }    
    }        
    $final_hash{eto} = \@eto;
    $final_hash{pro} = \@pro;
    return 1;                  # but add checks above
}

这里效率低下,并且没有错误检查,但方法很简单。如果您的输入很小,请将上面的内容更改为逐行处理,您清楚地知道该怎么做。它打印

1 => ...  (what you have in the question)
---
eto => [ 7802315095\d\d 7802315098\d\d 7802025002\d\d 7802025003\d\ d ...
pro => [ 7802025001\d\d ]

Note that your data does have one loose \d\ d.


We don't need to build the whole hash %result_hash for this but only need to pick the part of the line after :. I left the hash in since it is declared global so you may want to have it around. If it in fact isn't needed on its own this simplifies

sub Compare_results {
    my (@eto, @pro);
    while (<DATA>) {
        my ($val) = /:(.*)/;
        foreach my $field (split ',', $val)
        # ... same
    }
    # assign to %final_hash, return from sub
}

感谢ikegami征求意见。


只是为了好奇,这里是正则表达式的两遍

sub compare_rx {
    my @data = map { (split ':', $_)[1] } <DATA>;
    $final_hash{eto} = [ map { /([^,;]+)/g  } @data ];
    $final_hash{pro} = [ map { /;([^,;]+)/g } @data ];
    return 1;
}

这会使用否定字符类[^,;]<来选择所有不是;的字符。所以这取决于他们中的第一个,从左到右。它在全局范围内执行此操作,/g,因此它会继续遍历字符串,收集 ;“左侧”的所有字段。然后它会作弊一点,选择 ; 右侧的所有 [^,;]map 用于对所有数据行执行此操作。

如果需要%result_hash,则构建它而不是@data,然后使用my @values = values %hash_result从中提取值并使用 @values 提供 map 。

或者,逐行断开(同样,您可以构建 %result_hash 而不是直接获取 $data)

my (@eto, @pro);
while (<DATA>) {
    my ($data) = /:(.*)/;
    push @eto, $data =~ /([^,;]+)/g; 
    push @pro, $data =~ /;([^,;]+)/g;
}

关于perl - 在 perl 中处理嵌套分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39608704/

相关文章:

perl - "tee"qx 运算符

windows - 我如何让 Perl 理解导航到路径中有空格的目录?

protocols - protobuf 实现未使用的字节(用于限制器实现)

regex - 如何在 Perl 替换中替换匹配项之前的所有文本?

multithreading - 旧版 Perl 代码和 Apache2

regex - Perl 点文件名匹配

javascript - Angular JS 自定义分隔符

java - 在 Java 中使用扫描仪分隔符,如何保留用作分隔符的字符串?

java - 如何从字符串中删除指定的字符串

JavaScript 正则表达式 : match and identify alternative delimiters and capture surrounding text