perl - 在日志中显示错误并继续抓取其他网址

嗨，我想处理网址数组。如果一个网址存在问题，必须记录在errorfile.html中并继续处理其他网址。(要么网址加载失败，要么xpath失败错误)必须在错误日志中重新记录。收到错误消息“无法识别的调用方法“isa”

  use LWP::Simple;
use File::Compare;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;



{
open(FILE, "C:/Users/jeyakuma/Desktop/shipping project/input/input.txt");  

{

while(<FILE>)
    {                   
   chomp;
   $url=$_;
   foreach ($url)
    {
    ($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;
    }


do 'C:/Users/jeyakuma/Desktop/perl/mainsub.pl';
&domain_check();



        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);
        if ( $res->is_success ) 


        {

                print "working on $competitor\n";

                binmode ":utf8";
                my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
                print "Extracting the $competitor xpath\n";
                my @node = $xp->findnodes_as_string("$xpath") or print "couldn't find the node\n";

                open HTML, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/die/$competitor.html";

                foreach(<@node>)
                {
                print HTML @node;
                close HTML ;
                }

        }
        else{  
                print "In valid url";

        }
}


}
}

最佳答案

I wish to process array of urls

然后修改脚本以在数组上使用循环。

就像是

foreach my $url (@URLS){
    #work on $url here
    my $xp  = HTML::TreeBuilder::XPath->new_from_url($url);
    my @node = $xp->findnodes_as_strings('//div[@class="mainbox-body"]');
    #don't use die, instead record error message in file.
    print $error_log "node doesn't exist" unless @node; 
    #do other tasks for url
}

编辑:使用下面的代码，对我来说很好用。另外，脚本中的$xpath是什么？这就是给您isa错误的部分(您在注释中提到了)

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
#You can read URLS from file and create array, I'm doing directly for simplicity
my @urls = ("http://www.google.com", "http://www.yahoo.com");
foreach my $url (@urls){
        print "working on $url\n";
        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);
        if ( $res->is_success ) {
                print "In if block, success\n";
                my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
                my $node = $xp->findnodes_as_string('//div[@class="mainbox-body"]') or print "couldn't find the node\n";
        }
        else{  
                print "In else block\n";
        }
}

关于perl - 在日志中显示错误并继续抓取其他网址，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24384072/

perl - 在日志中显示错误并继续抓取其他网址

上一篇：magento - 通过Soap API Magento分配产品图片时出错

下一篇：java - 在Windows中使用Eclipse增加Java堆大小