嗨,我想处理网址数组。如果一个网址存在问题,必须记录在errorfile.html中并继续处理其他网址。(要么网址加载失败,要么xpath失败错误)必须在错误日志中重新记录。收到错误消息“无法识别的调用方法“isa”
use LWP::Simple;
use File::Compare;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
{
open(FILE, "C:/Users/jeyakuma/Desktop/shipping project/input/input.txt");
{
while(<FILE>)
{
chomp;
$url=$_;
foreach ($url)
{
($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;
}
do 'C:/Users/jeyakuma/Desktop/perl/mainsub.pl';
&domain_check();
my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
my $req = HTTP::Request->new( GET => "$url" );
my $res = $ua->request($req);
if ( $res->is_success )
{
print "working on $competitor\n";
binmode ":utf8";
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
print "Extracting the $competitor xpath\n";
my @node = $xp->findnodes_as_string("$xpath") or print "couldn't find the node\n";
open HTML, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/die/$competitor.html";
foreach(<@node>)
{
print HTML @node;
close HTML ;
}
}
else{
print "In valid url";
}
}
}
}
最佳答案
I wish to process array of urls
然后修改脚本以在数组上使用循环。
就像是
foreach my $url (@URLS){
#work on $url here
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
my @node = $xp->findnodes_as_strings('//div[@class="mainbox-body"]');
#don't use die, instead record error message in file.
print $error_log "node doesn't exist" unless @node;
#do other tasks for url
}
编辑:使用下面的代码,对我来说很好用。另外,脚本中的
$xpath
是什么?这就是给您isa
错误的部分(您在注释中提到了)#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
#You can read URLS from file and create array, I'm doing directly for simplicity
my @urls = ("http://www.google.com", "http://www.yahoo.com");
foreach my $url (@urls){
print "working on $url\n";
my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
my $req = HTTP::Request->new( GET => "$url" );
my $res = $ua->request($req);
if ( $res->is_success ) {
print "In if block, success\n";
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
my $node = $xp->findnodes_as_string('//div[@class="mainbox-body"]') or print "couldn't find the node\n";
}
else{
print "In else block\n";
}
}
关于perl - 在日志中显示错误并继续抓取其他网址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24384072/