php - 从字符串中删除所有 html 标签，只保留其中一个。具有一个标识符属性的高级 strip_tags

我需要删除一个字符串中的 HTML 标签并只保留其中的一种类型。我有一个包含以下内容的字符串:

 <!-- comment -->   <div id="55"> text </div> <span name=annotation value=125> 2 text </span> <p id="55"> text 3</p><span>text 4 <span>

我需要这个:

text  <span name=annotation value=125> 2 text </span> text 3text4

所以我需要删除除具有此表单的标签之外的所有 HTML 标签

"/(<span[^>]*annotation[^>]*value=.?(\w*).?[^>]*>)(.*?)<\/span>/"

我将其用作另一个表达式的一部分，但为了获得一个想法

我该怎么做？

我知道可以使用 preg_replace() 来完成，但我不知道我需要什么模式。

一个例子:

$str='<!-- comment --><p><b>Deoxyribonucleic acid</b> (<b>DNA</b>) is 
   a molecule encoding the <a href="/wiki/Genetics" title="Genetics">genetic</a> instructions
   used in the development and functioning of all known living <a href="/wiki/Organism" title="Organism">organi
   sms</a> and many <a href="/wiki/Virus" title="Virus">viruses</a>. Along with <a href="/wiki/RNA" title="RNA">RNA</a> and <a href="/wiki/Proteins" title="Proteins" class="mw-redirect">proteins</a>, DNA is one of the three major 
   <a href="/wiki/Macromolecules" title="Macromolecules" class="mw-redirect">macromolecules</a> 
   that are essential for all known forms of <a href="/wiki/Life" title="Life">life</a>.
   Genetic information is encoded<span id="200120131815150" 
   class="mymetastasis" value="247" name="annotation"> as a sequence of nucleotides (</span><a href="/wiki/Guanine" title="Guanine"><span id="200120131815151" class="mymetastasis" value="247" name="annotation">
   guanine</span></a><span id="200120131815152" class="mymetastasis" value="247" name="annotation">, </span><a href="/wiki/Adenine" title="Adenine"><span id="200120131815153" class="mymetastasis" value="247"
   name="annotation">adenine</span></a><span id="200120131815154" class="mymetastasis" value="247" name="annotation">,
   </span><a href="/wiki/Thymine" title="Thymine"><span id="200120131815155" class="mymetastasis" value="247"
   name="annotation">thymine</span></a><span id="200120131815156" class="mymetastasis" value="247" name="annotation">, 
   and </span><a href="/wiki/Cytosine" title="Cytosine">
   <span id="200120131815157" class="mymetastasis" value="247" name="annotation">cytosine</span></a><span id="200120131815158" class="mymetastasis" value="247" name="annotation">) 
   recorded using the letters G, A, T, and C. Most DNA molecules are double-strande</span>d helices, consisting of two long <a href="/wiki/Polymers" title="Polymers" class="mw-redirect">polymers</a> of simple units called <a href="/wiki/Nucleotide" 
   title="Nucleotide">nucleotides</a>, molecules with <a href="/wiki/Backbone_chain" title="Backbone chain">backbones</a>
   made of alternating <a href="/wiki/Monosaccharide" title="Monosaccharide">sugars<
   /a> (<a href="/wiki/Deoxyribose" title="Deoxyribose">deoxyribose</a>) and <a href="/wiki/Phosphate"
   title="Phosphate">phosphate</a> groups (related to phosphoric acid), with the <a href="/wiki/Nucleobases" title="Nucleobases" class="mw-redirect">nucleobases</a> (G, A, T, C) attached to the sugars. DNA is well-suited for biological information storage, since the DNA backbone is resistant to cleavage and the double-stranded structure provides the molecule with a 
   built-in duplicate of the encoded information.</p>';

PD:换行符、制表符等都是无意的。部分源文本。

最佳答案

您将需要多个正则表达式来执行此操作。

工作代码:

<?php
    header("Content-Type: text/plain");

    $str = '<!-- comment -->   <div id="55"> text </div> <span name=annotation value=125> 2 text </span> <p id="55"> text 3</p><span>text 4 </span>';

    // Save needed values
    $str = preg_replace("/<(span[^>]*?annotation.*?)>(.*?)<\/(.*?)>/", "!!!$1!!!$2!!!$3!!!", $str); 

    // Remove everything else
    $re = "/(<[^>]*?>)/";
    $str = preg_replace($re, "", $str);

    // Restore
    $str = preg_replace("/\!\!\!(span[^>]*?annotation.*?)\!\!\!(.*?)\!\!\!(.*?)\!\!\!/", "<$1>$2</$3>", $str); 

    echo $str;
?>

输出:

text  <span name=annotation value=125> 2 text </span>  text 3text 4

输入:

$str='<!-- comment --><p><b>Deoxyribonucleic acid</b> (<b>DNA</b>) is 
   a molecule encoding the <a href="/wiki/Genetics" title="Genetics">genetic</a> instructions
   used in the development and functioning of all known living <a href="/wiki/Organism" title="Organism">organi
   sms</a> and many <a href="/wiki/Virus" title="Virus">viruses</a>. Along with <a href="/wiki/RNA" title="RNA">RNA</a> and <a href="/wiki/Proteins" title="Proteins" class="mw-redirect">proteins</a>, DNA is one of the three major 
   <a href="/wiki/Macromolecules" title="Macromolecules" class="mw-redirect">macromolecules</a> 
   that are essential for all known forms of <a href="/wiki/Life" title="Life">life</a>.
   Genetic information is encoded<span id="200120131815150" 
   class="mymetastasis" value="247" name="annotation"> as a sequence of nucleotides (</span><a href="/wiki/Guanine" title="Guanine"><span id="200120131815151" class="mymetastasis" value="247" name="annotation">
   guanine</span></a><span id="200120131815152" class="mymetastasis" value="247" name="annotation">, </span><a href="/wiki/Adenine" title="Adenine"><span id="200120131815153" class="mymetastasis" value="247"
   name="annotation">adenine</span></a><span id="200120131815154" class="mymetastasis" value="247" name="annotation">,
   </span><a href="/wiki/Thymine" title="Thymine"><span id="200120131815155" class="mymetastasis" value="247"
   name="annotation">thymine</span></a><span id="200120131815156" class="mymetastasis" value="247" name="annotation">, 
   and </span><a href="/wiki/Cytosine" title="Cytosine">
   <span id="200120131815157" class="mymetastasis" value="247" name="annotation">cytosine</span></a><span id="200120131815158" class="mymetastasis" value="247" name="annotation">) 
   recorded using the letters G, A, T, and C. Most DNA molecules are double-strande</span>d helices, consisting of two long <a href="/wiki/Polymers" title="Polymers" class="mw-redirect">polymers</a> of simple units called <a href="/wiki/Nucleotide" 
   title="Nucleotide">nucleotides</a>, molecules with <a href="/wiki/Backbone_chain" title="Backbone chain">backbones</a>
   made of alternating <a href="/wiki/Monosaccharide" title="Monosaccharide">sugars<
   /a> (<a href="/wiki/Deoxyribose" title="Deoxyribose">deoxyribose</a>) and <a href="/wiki/Phosphate"
   title="Phosphate">phosphate</a> groups (related to phosphoric acid), with the <a href="/wiki/Nucleobases" title="Nucleobases" class="mw-redirect">nucleobases</a> (G, A, T, C) attached to the sugars. DNA is well-suited for biological information storage, since the DNA backbone is resistant to cleavage and the double-stranded structure provides the molecule with a 
   built-in duplicate of the encoded information.</p>';

输出:

Deoxyribonucleic acid (DNA) is 
   a molecule encoding the genetic instructions
   used in the development and functioning of all known living organi
   sms and many viruses. Along with RNA and proteins, DNA is one of the three major 
   macromolecules 
   that are essential for all known forms of life.
   Genetic information is encoded<span id="200120131815150" 
   class="mymetastasis" value="247" name="annotation"> as a sequence of nucleotides (</span>
   guanine<span id="200120131815152" class="mymetastasis" value="247" name="annotation">, </span><span id="200120131815153" class="mymetastasis" value="247"
   name="annotation">adenine</span>,
   <span id="200120131815155" class="mymetastasis" value="247"
   name="annotation">thymine</span>, 
   and 
   <span id="200120131815157" class="mymetastasis" value="247" name="annotation">cytosine</span>) 
   recorded using the letters G, A, T, and C. Most DNA molecules are double-stranded helices, consisting of two long polymers of simple units called nucleotides, molecules with backbones
   made of alternating sugars (deoxyribose) and phosphate groups (related to phosphoric acid), with the nucleobases (G, A, T, C) attached to the sugars. DNA is well-suited for biological information storage, since the DNA backbone is resistant to cleavage and the double-stranded structure provides the molecule with a 
   built-in duplicate of the encoded information.

关于php - 从字符串中删除所有 html 标签，只保留其中一个。具有一个标识符属性的高级 strip_tags，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14418234/

php - 从字符串中删除所有 html 标签，只保留其中一个。具有一个标识符属性的高级 strip_tags

上一篇：html - Safari 无法播放 HTML5 视频(IE9、Firefox 和 Chrome 都可以)

下一篇：javascript - 将数据插入表中并像表单一样提交