xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault

我有一个巨大的格式为 xml 的文件

<XML>
<Application id="1" attr1="some value" attr2="some val"..and many more attr also with nested tags inside application which might contain more attributes
</Application>

<Application id="2"attr1="some value" attr2="some val"..and many more attralso with nested tags inside application which might contain more attributes
</Application>

<Application id="3"attr1="some value" attr2="some val"..and many more attr also with nested tags inside application which might contain more attributes
</Application>

 .... probably 10000 more Application entries
</XML>

每个Application标签只有属性没有内容，但也包含可以有属性的嵌套标签，我需要解析和提取一些属性。我正在使用以下脚本，它在应用程序标签的一小部分上运行良好，但当记录变高时变得非常慢，不幸的是，当我在整个文件甚至一半的文件上运行它时，它会给我一个段错误核心转储文件。

这是我的脚本非常感谢任何关于如何更好地做到这一点的建议。

最佳答案

我相信您可以通过 XML::LibXML::Reader 来执行此操作，但我对此并不熟悉。下面是使用 XML::Twig 的方法。

我刚刚为您提供了如何获取 Application 元素中的数据的示例。

 #!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

$filename1 = "exam.xml";

my $parser = XML::Twig->new( twig_handlers => { Application => \&process_application })
                        ->parsefile($filename1);

sub process_application
  { my( $t, $sample)= @_;
    my $hncid    = $sample->att('ID);                     # get an attribute
    my @persons  = $sample->children( 'Person');
    my @aplnamt  = map { $_->att( 'APLN') } @persons;     # that's how you get all attribute values 
    my @students = $sample->findnodes( './Person/Student');
    my @nsschl   = map { $_->att('NS') } @students;
    my @d81      = $sample->descendant('*[@D8CHRG]'); 
    my @d81      = $sample->findnodes('.//*[@D8CHRG]');   # you can use a subset of XPath

    $t->purge;                                           # this is where you free the memory
  }

现在我想到了，您实际上可以使用 XML::Twig::XPath 来获得 XPath 的全部功能，我只是更习惯 XML::Twig 的 native 导航方法。

关于xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17376775/

xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault

上一篇：c# - 如何使用 XMLDocument 类从 C# 中的 XML 文件获取数据？

下一篇：c# - 如何处理序列化/反序列化 xml 对象时的特殊字符？