我有一个问题。我创建了一个页面来从公共(public)实体网站上抓取一些数据。他们的网站在使用方面不禁止这样做。反正都是公开数据。我知道我为此写了一个肮脏的页面,但我不明白为什么它一直循环。我的问题是我创建的用于运行实际抓取代码的模板页面一直在持续运行。它一次又一次地重新开始。这是代码:
<?php
/*
Template Name: Scraping template
*/
$strFile = $_GET['scrape'];
$intNumOfRec = 0;
$intNumOfErr = 0;
$intHeaderLine = '';
function fnLogger ($strLine) {
$hdlLogFile = fopen("ScrapingLogFile","a") or die("Unable to open file!");
fwrite($hdlLogFile,$strLine."\r\n");
fclose($hdlLogFile);
return;
}
function fnProcessMcr() {
global $wpdb,$intNumOfRec,$intNumOfErr;
$intRecChunk = '50';
$strQuery = 'SELECT * FROM frg_subdivision_index WHERE authority is null limit '.$intRecChunk.';';
$objQuery = $wpdb->get_results($strQuery);
echo $strQuery.'</br>';
fnLogger($strQuery);
foreach($objQuery as $index=>$row)
{
fnLogger($row->id.' ');
if(strlen($row->book) !== 0 && strlen($row->map) !== 0 && strlen($row->begin) !== 0)
{
$url = '[url withheld]?q='.$row->book.'-'.$row->map.'-'.$row->begin;
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
$data = curl_exec($ch);
curl_close($ch);
$intCheckIndex = stripos($data,'<td class="right aligned"><h3 class="ui huge basic header">');
if(!$intCheckIndex)
{
//echo $row->id.': Could not find type prefix.';
$strType = 'Unknown';
$strJurisdiction = 'Unknown';
$intNumOfErr++;
}
else
{
$data = substr($data,$intCheckIndex+59);
$intCheckIndex = stripos($data,'<strong>[CANCELLED]</strong>');
$strType = "";
if($intCheckIndex !== false) {
$data = substr($data,$intCheckIndex+28);
$strType = 'Cancelled ';
}
$strType .= trim(substr($data,0,stripos($data,'Parcel')-1));
$data = substr($data,stripos($data,'Local Jurisdiction</td>')+23);
$data = substr($data,stripos($data,'<td>')+4);
$strJurisdiction = ucwords(strtolower(trim(substr($data,0,stripos($data,'<')))));
//echo ($index+$intBegRec).': Type is: '.$strType.' in '.$strJurisdiction;
$intNumOfRec++;
}
}
else
{
echo $row->id.': Missing book, map or begin.</br>';
$strType = 'Unknown';
$strJurisdiction = 'Unknown';
$intNumOfErr++;
}
$strUpdateResults = $wpdb->update('frg_subdivision_index',array(
'type' => $strType,
'authority' => $strJurisdiction),
array(
'id' => $row->id));
echo $row->id.': Type: '.$strType.' Authority:'.$strJurisdiction;
if($strUpdateResults === false)
{
echo ': ERROR update database.</br>';
$intNumOfErr++;
}
else
{
echo '</br>';
}
}
echo "</br></br>Number of records updated was: ".$intNumOfRec.'</br>';
echo "Number of errors was: ".$intNumOfErr.'</br>';
return;
}
switch ($strFile) {
case 'mcr':
fnLogger('Entered Switch Case mcr');
fnProcessMcr();
break;
case 'mcrunknown':
fnProcessMcrUnknown();
break;
default:
fnChangeTo404();
}
?>
这是日志文件的输出,因此您可以看到它在做什么。
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30729
30730
30731
30732
30733
30734
30735
30736
30737
30738
30739
30740
30741
30742
30743
30744
30745
30746
30747
30748
30749
30750
30751
30752
30753
30754
30755
30756
30757
30758
30759
30760
30761
30762
30763
30764
30765
30766
30767
30768
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30768
30769
30769
30770
30770
30771
30771
30772
30772
30773
30773
30774
30774
30775
30775
30776
30776
30777
30777
30778
30778
30779
30780
30781
30782
30783
30784
30785
30786
30787
30788
30789
30790
30791
30792
30793
30794
30795
30796
30797
30798
30799
30800
30801
30802
30803
30804
30805
30806
30807
30808
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30808
30809
30809
30810
30810
30811
30811
30812
30812
30813
30813
30814
30814
30815
30815
30816
30816
30817
30817
30818
30819
30820
30821
30822
30823
30824
30825
30826
30827
30828
30829
30830
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30830
30831
30831
30832
30832
30833
30833
30834
30834
30835
30835
30836
30836
30837
30837
30838
30838
30839
30839
30840
30840
30841
30841
Entered Switch Case mcr
SELECT * FROM frg_subdivision_index WHERE authority is null limit 50;
30841
30842
有人知道为什么它一直循环返回吗?
最佳答案
好的,代码没有问题。它正在处理 Wordpress 中的超时。
关于php - Wordpress PHP模板页面不断循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39301946/