perl - 在日志中显示错误并继续抓取其他网址

标签 perl xpath error-handling

嗨,我想处理网址数组。如果一个网址存在问题,必须记录在errorfile.html中并继续处理其他网址。(要么网址加载失败,要么xpath失败错误)必须在错误日志中重新记录。收到错误消息“无法识别的调用方法“isa”

  use LWP::Simple;
use File::Compare;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;



{
open(FILE, "C:/Users/jeyakuma/Desktop/shipping project/input/input.txt");  

{

while(<FILE>)
    {                   
   chomp;
   $url=$_;
   foreach ($url)
    {
    ($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;
    }


do 'C:/Users/jeyakuma/Desktop/perl/mainsub.pl';
&domain_check();



        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);
        if ( $res->is_success ) 


        {

                print "working on $competitor\n";

                binmode ":utf8";
                my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
                print "Extracting the $competitor xpath\n";
                my @node = $xp->findnodes_as_string("$xpath") or print "couldn't find the node\n";

                open HTML, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/die/$competitor.html";

                foreach(<@node>)
                {
                print HTML @node;
                close HTML ;
                }

        }
        else{  
                print "In valid url";

        }
}


}
}

最佳答案

I wish to process array of urls

然后修改脚本以在数组上使用循环。

就像是
foreach my $url (@URLS){
    #work on $url here
    my $xp  = HTML::TreeBuilder::XPath->new_from_url($url);
    my @node = $xp->findnodes_as_strings('//div[@class="mainbox-body"]');
    #don't use die, instead record error message in file.
    print $error_log "node doesn't exist" unless @node; 
    #do other tasks for url
}

编辑:使用下面的代码,对我来说很好用。另外,脚本中的$xpath是什么?这就是给您isa错误的部分(您在注释中提到了)
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
#You can read URLS from file and create array, I'm doing directly for simplicity
my @urls = ("http://www.google.com", "http://www.yahoo.com");
foreach my $url (@urls){
        print "working on $url\n";
        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);
        if ( $res->is_success ) {
                print "In if block, success\n";
                my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
                my $node = $xp->findnodes_as_string('//div[@class="mainbox-body"]') or print "couldn't find the node\n";
        }
        else{  
                print "In else block\n";
        }
}

关于perl - 在日志中显示错误并继续抓取其他网址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24384072/

相关文章:

windows - 为什么我的 Perl 脚本和 Windows 的 SendTo 不能使用超过 20 个文件?

perl - 为什么 "fork inside BEGIN ... a horrible prospect"在 Perl 中?

xml - 如何在for-each循环中根据属性值放置条件

authentication - 登录失败 : unknown user name or bad password

asp.net - 是否有 “on event”可供下载?

python - 如何通过BucketType定义命令错误消息

perl - Atom 编辑器和嵌入式 perl : syntax highlighting

perl - Data::Dumper::Simple 的条件加载不工作

xml - Xquery 3.0 按属性值分组

xml - 使用 xPath 从表中获取特定数据