grails - 如何避免groovy/XMLSlurper从节点剥离html标签？

标签 grails groovy html-parsing xmlslurper

我正在从POST响应中读取HTML文件，并使用XMLSlurper对其进行解析。页面上的textarea节点中放入了一些HTML代码(未使用Urlencode编码-不是我的选择)，当我读取该值时，Groovy会剥离所有标签。

例:

<html>
    <body>
        <textarea><html><body>This has html code for some reason</body></html></textarea>
    </body>
</html>

当我解析以上内容，然后找到(...)“textarea”节点时，它返回给我:

This has html code for some reason

并且没有任何标签。如何保存标签？

最佳答案

我认为您正在获取正确的数据，但打印出的数据有误...您可以尝试使用StreamingMarkupBuilder将节点转换回xml吗？

def xml = '''<html>
            |  <body>
            |    <textarea><html><body>This has html code for some reason</body></html></textarea>
            |  </body>
            |</html>'''

def ta = new XmlSlurper().parseText( xml ).body.textarea

String content = new groovy.xml.StreamingMarkupBuilder().bind {
  mkp.yield ta.children()
}

assert content == '<html><body>This has html code for some reason</body></html>'

关于grails - 如何避免groovy/XMLSlurper从节点剥离html标签？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9710164/

上一篇：grails - Grails启动错误消息

下一篇：ssh - 开始 screen 与ssh分开在一个无聊的盒子中，怎么办？

grails - 具有RAM索引的Grails可搜索插件集在部署到Cloud Foundry时丢失

grails - Quartz在Grails4中不起作用，依赖项中的问题无法编译

Gradle 自定义任务实现 : could not find method for arguments

grails - Groovy 和 final 属性如何用 Map 设置？

grails - 如何在 Grails 中设计领域类？

groovy - 如何在unetstack中发送和接收基带信号？

python - 使用BeautifulSoup只考虑网页内容的某一部分

python - BeautifulSoup:如何用跨度标签替换内容

r - 根据特定模式抓取多个段落