大家。
我正在构建一个软件,对给定的 HTML 文本进行一些文本解析,当我保存 HTML 中的所有段落时,我发现一个额外的节点。
我已经创建了
<p id="original_content_js"> Original content via JS:<br> </p>
保存接收到的解析数据,并与解析后的数据(原文)进行比较。
这是 HTML 代码:
<p id="original_content_js">
Original content via JS:<br>
</p>
<div id="original_text">
<h3>Molly's Sheep</h3>
<p>
Molly had a little sheep. <br>
Molly didn't like her sheep. Ir was too hairy.<br>
So Molly took a big knife, and cut all of her sheep's fur.<br>
Now Molly's sheep is cold.<br>
</p>
<p>
But what Molly did not know, was that her sheep is a magical sheep;<br>
Molly's sheep grows hair instantly, magically!<br>
Oh, how wonderful, Molly's sheep,<br>
Making hair, each and each<br>
Hair grows quickly after cut,<br>
That's what the story's all about.
<p>
</div>
这是解析代码:
var html_text_name = "original_text";
var html_text = document.getElementById(html_text_name);
var text_paragaphs = html_text.getElementsByTagName("p");
for (var x=0; x<text_paragaphs.length; x++){
document.getElementById("original_content_js").innerHTML += "ABC" +
text_paragaphs[x].innerHTML + "CBA <br>";
}
我进入original_content_js段落的结果是:
Original content via JS:
ABC Molly had a little sheep.
Molly didn't like her sheep. Ir was too hairy.
So Molly took a big knife, and cut all of her sheep's fur.
Now Molly's sheep is cold.
CBA
ABC But what Molly did not know, was that her sheep is a magical sheep;
Molly's sheep grows hair instantly, magically!
Oh, how wonderful, Molly's sheep,
Making hair, each and each
Hair grows quickly after cut,
That's what the story's all about. CBA
ABC CBA
所以你可以看到我得到了预期的结果 - 2 个段落包含在“ABC”和“CBA”中,除了最后有另一个空节点。为什么还要多一个节点?
最佳答案
您没有检查段落是否正确关闭。因此,您的代码会看到三个开始的 p 标签,并假设存在三个段落。最后一个 p 标签应该是一个封闭的 p 标签。这是一个问题,因为它将 text_paragraphs 设置为 3 而不是 2。您需要编写一个正则表达式来检查这一点...但要注意...为 HTML 解析编写正则表达式是一件可怕的事情...并且通常不可能100% 准确地完成。
编辑:我并不是说你不应该根据你的情况编写正则表达式来检查标签是否正确关闭......我只是说,要小心。
关于javascript - 使用 JS 解析 HTML 文本 - 额外节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33091773/