html - 如何安全地注入(inject) HTML 代码(避免 HTML 注入(inject))

标签 html dom xss code-injection html-injections

在我们的企业软件中,我们允许客户提供他们自己的 HTML 来自定义应用程序的“联系”和“法律”页面。这是一个很好的功能,但由于我们的应用程序对实际提供 HTML 的应用程序一无所知,我想知道我将如何解决这样的问题。我已经阅读了一些博客文章、SO 帖子并观看了一些视频,但这些仅解释了 HTML 注入(inject)的危险或如何使用 createElementinnerHTML或其他直接方法。
我正在寻找最安全的方法来显示我无法直接控制的 HTML。任何文章或图书馆将不胜感激。

最佳答案

您应该 sanitizer 用户输入的 HTML 代码。
一种方法是定义一个 白名单 :您定义了一个标签列表(对于每个标签,您还可以定义一个允许属性列表),这些标签允许出现在输出 HTML 中。未明确允许的所有其他标签都将被清理删除。
已经有很多可用的白名单。您可以将其用于 TinyMCE HTML editor例如(见 here):

<?xml version="1.0" encoding="UTF-8" ?>

<!--
    TinyMCE
-->

<anti-samy-rules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="antisamy.xsd">

    <directives>
        <directive name="omitXmlDeclaration" value="true" />
        <directive name="omitDoctypeDeclaration" value="false" />
        <directive name="maxInputSize" value="100000" />
        <directive name="embedStyleSheets" value="false" />
        <directive name="useXHTML" value="true" />
        <directive name="formatOutput" value="true" />
    </directives>

    <common-regexps>

        <!--
            From W3C:
            This attribute assigns a class name or set of class names to an
            element. Any number of elements may be assigned the same class
            name or names. Multiple class names must be separated by white
            space characters.
        -->
        <regexp name="htmlTitle" value="[a-zA-Z0-9\s\-_',:\[\]!\./\\\(\)&amp;]*" />

        <!--  force non-empty with a '+' at the end instead of '*'
        -->
        <regexp name="onsiteURL" value="([\p{L}\p{N}\p{Zs}/\.\?=&amp;\-~])+" />

        <!--  ([\w\\/\.\?=&amp;;\#-~]+|\#(\w)+)
        -->

        <!--  ([\p{L}/ 0-9&amp;\#-.?=])*
        -->
        <regexp name="offsiteURL" value="(\s)*((ht|f)tp(s?)://|mailto:)[A-Za-z0-9]+[~a-zA-Z0-9-_\.@\#\$%&amp;;:,\?=/\+!\(\)]*(\s)*" />
    </common-regexps>

    <!--
        Tag.name = a, b, div, body, etc.
        Tag.action = filter: remove tags, but keep content, validate: keep content as long as it passes rules, remove: remove tag and contents
        Attribute.name = id, class, href, align, width, etc.
        Attribute.onInvalid = what to do when the attribute is invalid, e.g., remove the tag (removeTag), remove the attribute (removeAttribute), filter the tag (filterTag)
        Attribute.description = What rules in English you want to tell the users they can have for this attribute. Include helpful things so they'll be able to tune their HTML
    -->

    <!--
        Some attributes are common to all (or most) HTML tags. There aren't many that qualify for this. You have to make sure there's no
        collisions between any of these attribute names with attribute names of other tags that are for different purposes.
    -->

    <common-attributes>

        <attribute name="lang"
            description="The 'lang' attribute tells the browser what language the element's attribute values and content are written in">

            <regexp-list>
                <regexp value="[a-zA-Z]{2,20}" />
            </regexp-list>
        </attribute>

        <attribute name="title"
            description="The 'title' attribute provides text that shows up in a 'tooltip' when a user hovers their mouse over the element">

            <regexp-list>
                <regexp name="htmlTitle" />
            </regexp-list>
        </attribute>

        <attribute name="href" onInvalid="filterTag">

            <regexp-list>
                <regexp name="onsiteURL" />
                <regexp name="offsiteURL" />

                <!--
                -->
            </regexp-list>
        </attribute>

        <attribute name="align"
            description="The 'align' attribute of an HTML element is a direction word, like 'left', 'right' or 'center'">

            <literal-list>
                <literal value="center" />
                <literal value="left" />
                <literal value="right" />
                <literal value="justify" />
                <literal value="char" />
            </literal-list>
        </attribute>
        <attribute name="style"
            description="The 'style' attribute provides the ability for users to change many attributes of the tag's contents using a strict syntax" />
    </common-attributes>

    <!--
        This requires normal updates as browsers continue to diverge from the W3C and each other. As long as the browser wars continue
        this is going to continue. I'm not sure war is the right word for what's going on. Doesn't somebody have to win a war after
        a while?


    -->

    <global-tag-attributes>
        <attribute name="title" />
        <attribute name="lang" />
        <attribute name="style" />
    </global-tag-attributes>

    <tags-to-encode>
        <tag>g</tag>
        <tag>grin</tag>
    </tags-to-encode>

    <tag-rules>

        <!--  Remove  -->

        <tag name="script" action="remove" />
        <tag name="noscript" action="remove" />
        <tag name="iframe" action="remove" />
        <tag name="frameset" action="remove" />
        <tag name="frame" action="remove" />
        <tag name="noframes" action="remove" />
        <tag name="head" action="remove" />
        <tag name="title" action="remove" />
        <tag name="base" action="remove" />
        <tag name="style" action="remove" />
        <tag name="link" action="remove" />
        <tag name="input" action="remove" />
        <tag name="textarea" action="remove" />

        <!--  Truncate  -->
        <tag name="br" action="truncate" />

        <!--  Validate -->

        <tag name="p" action="validate">
            <attribute name="align" />
        </tag>
        <tag name="div" action="validate" />
        <tag name="span" action="validate" />
        <tag name="i" action="validate" />
        <tag name="b" action="validate" />
        <tag name="strong" action="validate" />
        <tag name="s" action="validate" />
        <tag name="strike" action="validate" />
        <tag name="u" action="validate" />
        <tag name="em" action="validate" />
        <tag name="blockquote" action="validate" />
        <tag name="tt" action="truncate" />

        <tag name="a" action="validate">
            <attribute name="href" onInvalid="filterTag" />

            <attribute name="nohref">

                <literal-list>
                    <literal value="nohref" />
                    <literal value="" />
                </literal-list>
            </attribute>

            <attribute name="rel">

                <literal-list>
                    <literal value="nofollow" />
                </literal-list>
            </attribute>
        </tag>

        <!--  List tags
        -->
        <tag name="ul" action="validate" />
        <tag name="ol" action="validate" />
        <tag name="li" action="validate" />
        <tag name="dl" action="validate" />
        <tag name="dt" action="validate" />
        <tag name="dd" action="validate" />
    </tag-rules>

    <css-rules>

        <property name="text-decoration" default="none"
            description="">

            <category-list>
                <category value="visual" />
            </category-list>

            <literal-list>
                <literal value="underline" />
                <literal value="overline" />
                <literal value="line-through" />
            </literal-list>
        </property>
    </css-rules>
</anti-samy-rules>
此策略旨在清理在 HTML 编辑器中输入的 HTML,因此它可能适合您的需求。
您还可以找到更多政策 here .
如果您使用 Java,则可以通过 java-html-sanitizer 实现该策略。 ,或者你也可以定义一个自定义的。

关于html - 如何安全地注入(inject) HTML 代码(避免 HTML 注入(inject)),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64671769/

相关文章:

javascript - 堆栈交换站点列表页面中的框重排算法

jQuery UI Sortable 和 Bootstrap 3 流体网格闪烁和中断

javascript - 如何更新正文内容而不是添加到现有内容?

javascript - 禁用 JavaScript 时创建浏览器后退按钮?

java - 在 Servlet 中编写 HTTP 响应时存在跨站脚本缺陷(下载 excel 文件)

cookies - 子域上的第三方代码

html - 使用 nth-child 将 CSS 样式应用于特定标签甚至在它们之间存在其他标签

javascript - 使滚动侧边栏停在页脚处

jquery - .bind 到 .each 循环中的 dom 对象

php - 外部图像漏洞