ruby-on-rails - Ruby:检查字节顺序标记

标签 ruby-on-rails ruby encoding byte-order-mark

在Rails 中，我们将一些文本文件作为ISO-8859-1。有时文件以 UTF-8 with BOM 的形式出现。我正在尝试确定它的 UTF-8 with BMO 然后将文件重新读取为 bom|UTF-8。

我尝试了以下但它似乎没有正确比较:

# file is saved as UTF-8 with BOM using Sublime Text 2

> string = File.read(file, encoding: 'ISO-8859-1')

# this doesn't work, while it supposed to work
> string.start_with?("\xef\xbb\xbf".force_encoding("UTF-8"))
> false

# it works if I try this
> string.start_with?('ï»¿')
> true

目的是将文件读取为 UTF-8 with BOM 如果文件在开头有字节顺序标记并且我想避免 string.start_with?('ï»¿ ') 方法。

最佳答案

string.start_with?("\u00ef\u00bb\u00bf")

来自 Ruby official documentation :

\xnn hexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])

\unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])

就是说，要插入一个 unicode 字符，应该使用 \uXXXX 表示法。它是安全的，我们可以可靠地使用这个版本。

关于ruby-on-rails - Ruby:检查字节顺序标记，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44171895/

上一篇：ruby-on-rails - Rails 5 + Shrine 多文件上传

下一篇：ruby - 检查 Appium 中是否显示具有不同文本大小写的元素并执行相应的逻辑

相关文章：

mysql - 在 Rails + MySQL 中存储百分比

ruby-on-rails - Rails:多部分/表单数据的 ActionDispatch::Request.parameter_parsers

Ruby 逻辑运算符 - 一个但不是两个数组中的元素

javascript - 从 ajax 检索时无法识别和显示 HTML 实体

grails - 在我将 grails 从 2.3.7 升级到 2.4.3 后，所有 request.JSON 数据都被分解了。

ruby-on-rails - Rails 3 ActiveRecord where 子句，其中 id 设置或为 null

ruby-on-rails - rails : Faster way to perform updates on many records

javascript - 自动为 TinyMCE 插入的标题生成 id

ruby - inject 和 ruby 中的 reduce 是一样的吗？

Java文件解析工具包设计，快速文件编码完整性检查