他们可以根据启发式猜测

我不知道现在的浏览器在编码检测方面有多好，但 MS Word 在这方面做得非常好，甚至可以识别我以前从未听说过的字符集。您可以打开一个随机编码的 *.txt 文件并查看。

This algorithm usually involves statistical analysis of byte patterns, like frequency distribution of trigraphs of various languages encoded in each code page that will be detected; such statistical analysis can also be used to perform language detection.

https://en.wikipedia.org/wiki/Charset_detection

Firefox 使用 Mozilla Charset Detectors .解释了它的工作方式 here你也可以change its heuristic preferences . Mozilla 字符集检测器甚至是 forked至 uchardet哪个效果更好并检测到更多语言

[更新:如下评论，它移至chardetng自 Firefox 73]

以前使用的 Chrome ICU detector但切换到CED几乎2 years ago

没有一个检测算法是完美的，他们可以猜错 like this ，因为它只是在猜测!

This process is not foolproof because it depends on statistical data.

所以这就是著名的 Bush hid the facts错误发生。错误的猜测也会给系统带来漏洞

For all those skeptics out there, there is a very good reason why the character encoding should be explicitly stated. When the browser isn't told what the character encoding of a text is, it has to guess: and sometimes the guess is wrong. Hackers can manipulate this guess in order to slip XSS past filters and then fool the browser into executing it as active code. A great example of this is the Google UTF-7 exploit.

http://htmlpurifier.org/docs/enduser-utf8.html#fixcharset-none

因此，编码应始终明确说明。

关于html - 浏览器如何确定使用的编码？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43148464/

html - 浏览器如何确定使用的编码？

他们可以根据启发式猜测

上一篇：html - 带有背景图像的 div 下的多余细线

下一篇：javascript - 根据其中的文本动态设置列标题的宽度