我正在 try catch home_impact 和 away_impact,但是当我提取文本时,它充满了空白行、空格、断行等,如下所示:
David Luiz
35'
36'
De Gea
我也试过只提取 div id match_info 但它只生成一个包含一个元素的数组,而且它有很多换行符。我试过使用preserveWhiteSpace 和preg_replace 但没有用,知道如何避免这种情况吗?谢谢。
网址:
<div id="match_info">
<div class="direct_line">
<div class="home_impact"><div class='player_name'>David Luiz </div></div>
<div class="minute">35'</div>
<div class="away_impact">
</div>
</div>
<div class="direct_line">
<div class="home_impact"></div>
<div class="minute">36'</div>
<div class="away_impact">
<div class='player_name'>De Gea</div>
</div>
</div>
<div class="direct_line">
<div class="home_impact"></div>
<div class="minute">38'</div>
<div class="away_impact">
<div class='player_name'>Ashley Cole</div>
</div>
<div class="home_impact"><div class='player_name'>Juan Mata</div>/div>
<div class="minute">35'</div>
<div class="away_impact">
</div>
</div>
PHP:
$html = file_get_contents($url);
$doc = new DOMDocument();
//$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTML($html);
$xpath = new DOMXpath ($doc);
$expresionHome="//div[@class='home_impact']";
$expresionAway="//div[@class='away_impact']";
$nodesHome = $xpath->evaluate($expresionHome);
$nodesAway = $xpath->evaluate($expresionAway);
for ($i=0;$i<$nodesHome->length;$i++)
{
echo $nodesHome->item($i)->nodeValue;
echo $nodesAway->item($i)->nodeValue;
}
最佳答案
您只能使用 DOMDocument 而不需要对节点内容进行任何修剪或使用正则表达式。考虑以下示例,请注意 DOMDocument 属性 保留空白空间 和 格式输出 (如果你想漂亮地打印它)
// DOMDocument with unformatted content
$unformatteddocument= new DOMDocument("1.0", "utf-8");
$unformatteddocument->load(PATH_OF_UNFORMATTED_XML);
$document = new DOMDocument("1.0", "utf-8");
$document->preserveWhiteSpace = false;
$document->formatOutput = true;
$document->loadXML($unformatteddocument->saveXML());
$document->save(PATH_FOR_FORMATTED_XML);
关于php - 使用 Php Dom Document 从捕获的数据中删除空格和换行符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25486687/