javascript - 这是将 html 转换为文本的安全方法吗

标签 javascript xss

我收到了来自一些半不受信任的 API 的响应,该响应应该包含 html。 现在我想将其转换为纯文本,基本上删除所有格式,以便我可以轻松搜索它,然后显示它(部分)。

我想出了这个:

function convertHtmlToText(html) {
    const div = document.createElement("div");
    // assumpton: because the div is not part of the document 
    // - no scripts are executed
    // - no layout pass
    div.innerHTML = html; 
    // assumption: whitespace is still normalized
    // assumption: this returns the text a user would see, if the element was inserted into the DOM.
    //             Minus the stuff that would depend on stylesheets anyway.
    return div.innerText; 
}

const html = `
    Some random untrusted string that is supposed to contain html. 
    Presumably some 'rich text'. 
    A few <div> or <p>, a link or two, a bit of <strong> and some such. 
    In any case not a complete html document.
`;

const text = convertHtmlToText(html);

const p = document.createElement("p");
p.textContent = text;
document.body.append(p);

认为这是安全的,因为只要用于转换的div没有插入到文档中,脚本就不会被执行。

问题:这安全吗?

最佳答案

不,这根本不安全。

function convertHtmlToText(html) {
    const div = document.createElement("div");
    // assumpton: because the div is not part of the document 
    // - no scripts are executed
    // - no layout pass
    div.innerHTML = html; 
    // assumption: whitespace is still normalized
    // assumption: this returns the text a user would see, if the element was inserted into the DOM.
    //             Minus the stuff that would depend on stylesheets anyway.
    return div.innerText; 
}

const html = `<img onerror="alert('Gotcha!')" src="">Hi`;

const text = convertHtmlToText(html);

const p = document.createElement("p");
p.textContent = text;
document.body.append(p);

如果您确实只能处理文本内容,那么更喜欢不会执行任何脚本的 DOMParser:

function convertHtmlToText(html) {
  const doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.body.innerText;
}

const html = `<img onerror="alert('Gotcha!')" src="">Hi`;

const text = convertHtmlToText(html);

const p = document.createElement("p");
p.textContent = text;
document.body.append(p);

但请注意,这些方法还会捕获用户通常无法看到的节点的文本内容(例如 <style><script> )。

关于javascript - 这是将 html 转换为文本的安全方法吗,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63824980/

相关文章:

javascript - 有没有办法将 X-Frame-Options header 添加到通过 Tomcat 提供的 JavaScript 文件中?

php - 从 XSS 中清除 Markdown

security - session ID 放置 : Form Hidden Field vs. HTTPOnly Cookie

security - 在持续集成环境中测试跨站点脚本 (XSS) 漏洞

java - HttpOnly 标志不起作用

javascript - 从数组javascript中删除值

javascript - 如何比较数组中的连续日期/时间项并根据特定时间进行过滤

带有图像和链接的 JavaScript 警报

javascript - 使用 jQuery 在自定义 UL LI 框中触发更改事件

javascript - For循环重复第一次迭代两次