给定一些 HTML,我应用 http://php.net/manual/en/class.domdocument.php类,保存它,然后 Â
偶尔会插入符号。它似乎发生在具有单个空格的标签上(与
相对),但似乎不是绝对的(只有第一个 <span>
元素表现出这种现象)。我尝试按照 PHP DOMDocument->getElementByID adding  in place of empty <span> 的建议在显示生成的 HTML 时添加编码然而,问题依然存在。是什么原因导致这种情况以及如何预防?
如果您对我这样做的原因感兴趣。我有一个应用程序,我用文本替换 HTML 图像。当将 HTML 从 Outlook 电子邮件复制并粘贴到 TinyMCE 编辑器,然后解析 HTML 时,我遇到了这种行为。
<?php
$message = <<<EOT
<p>Start</p>
<p> </p>
<p> </p>
<p></p>
<p class="MsoNormal">
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
<span style="font-size:10pt;font-family:Arial, 'sans-serif';color:#000080;">Phone: (444) 777-7777</span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
</p>
<p>End</p>
EOT;
echo('<p>Initial HTML:</p> '.$message);
$message_encoded = utf8_encode($message);
$doc = new DOMDocument();
$doc->loadHTML($message);
$body = $doc->getElementsByTagName('body')->item(0);
$message=$doc->saveHTML($body);
echo('<p>Final HTML:</p> '.$message);
echo('<p>Initial HTML encoded:</p> '.$message_encoded);
$doc->loadHTML($message_encoded);
$body = $doc->getElementsByTagName('body')->item(0);
$message_encoded=$doc->saveHTML($body);
echo('<p>Final HTML:</p> '.$message_encoded);
?>
输出:
<p>Initial HTML:</p> <p>Start</p>
<p> </p>
<p> </p>
<p></p>
<p class="MsoNormal">
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
<span style="font-size:10pt;font-family:Arial, 'sans-serif';color:#000080;">Phone: (444) 777-7777</span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
</p>
<p>End</p><p>Final HTML:</p> <body>
<p>Start</p>
<p>Â </p>
<p>Â </p>
<p></p>
<p class="MsoNormal">
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;">Â <br></span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br></span>
<span style="font-size:10pt;font-family:Arial, 'sans-serif';color:#000080;">Phone: (444)Â 777-7777</span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br></span>
</p>
<p>End</p>
</body><p>Initial HTML encoded:</p> <p>Start</p>
<p>Â </p>
<p>Â </p>
<p></p>
<p class="MsoNormal">
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;">Â <br /></span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
<span style="font-size:10pt;font-family:Arial, 'sans-serif';color:#000080;">Phone: (444)Â 777-7777</span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br /></span>
</p>
<p>End</p><p>Final HTML:</p> <body>
<p>Start</p>
<p>ÃÂ </p>
<p>ÃÂ </p>
<p></p>
<p class="MsoNormal">
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;">ÃÂ <br></span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br></span>
<span style="font-size:10pt;font-family:Arial, 'sans-serif';color:#000080;">Phone: (444)ÃÂ 777-7777</span>
<span style="font-size:10pt;font-family:Calibri, 'sans-serif';color:#000080;"> <br></span>
</p>
<p>End</p>
</body>
最佳答案
这为我解决了这个问题:
$doc->loadHTML('<?xml encoding="utf-8"?>' . $message);
通过将字符串添加到 HTML 字符串前面,您将告诉 PHP 使用 UTF-8 作为编码。
关于PHP domdocument 插入 Â 符号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28647909/