ruby - 在使用 Mechanize 进行抓取时,我总是在 Ruby 2.0 中遇到 UndefinedConversionError

标签 ruby encoding utf-8 mechanize iconv

当我尝试使用 Mechanize 和 Ruby 2.0 提交文本区域时,我总是得到一个

Encoding::UndefinedConversionError: U+0151 from UTF-8 to ISO-8859-1

然后我尝试用 Iconv 转换文本,我得到了类似的结果:

Iconv.iconv("LATIN1", "UTF-8", text)

我收到此错误消息:

Iconv::IllegalSequence: "őzködik, melyet "...

由于文本包含东欧字符。我该怎么做才能避免这种不便或如何在不同编码之间正确转换?

最佳答案

我找到了一个优雅的解决方案:

replacements = [["À", "À"], ["Á", "Á"], ["Â", "Â"], ["Ã", "Ã"], ["Ä", "Ä"], ["Å", "Å"], ["Æ", "Æ"], ["Ç", "Ç"], ["È", "È"], ["É", "É"], ["Ê", "Ê"], ["Ë", "Ë"], ["Ì", "Ì"], ["Í", "Í"], ["Î", "Î"], ["Ï", "Ï"], ["Ð", "Ð"], ["Ñ", "Ñ"], ["Ò", "Ò"], ["Ó", "Ó"], ["Ô", "Ô"], ["Õ", "Õ"], ["Ö", "Ö"], ["Ø", "Ø"], ["Ù", "Ù"], ["Ú", "Ú"], ["Û", "Û"], ["Ü", "Ü"], ["Ý", "Ý"], ["Þ", "Þ"], ["ß", "ß"], ["à", "à"], ["á", "á"], ["â", "â"], ["ã", "ã"], ["ä", "ä"], ["å", "å"], ["æ", "æ"], ["ç", "ç"], ["è", "è"], ["é", "é"], ["ê", "ê"], ["ë", "ë"], ["ì", "ì"], ["í", "í"], ["î", "î"], ["ï", "ï"], ["ð", "ð"], ["ñ", "ñ"], ["ò", "ò"], ["ó", "ó"], ["ô", "ô"], ["õ", "õ"], ["ö", "ö"], ["ø", "ø"], ["ù", "ù"], ["ú", "ú"], ["û", "û"], ["ü", "ü"], ["ý", "ý"], ["þ", "þ"], ["ÿ", "ÿ"]]

def replace(str,replacements)
 replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
 return str
end

my_string=replace(my_string,replacements)

关于ruby - 在使用 Mechanize 进行抓取时,我总是在 Ruby 2.0 中遇到 UndefinedConversionError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18559293/

相关文章:

python - 在 Python 中读取包含英语和阿拉伯语文本的文本文件

encoding - 如何停止 Server.HtmlEncode 编码 UTF8 字符?

Ruby:隐式 block 转换为 Proc

ruby - 如何用 RSpec 测试这个?

ruby - 如何使用 Ruby 在网页上搜索然后解析结果?

ruby-on-rails - 不可逆迁移 - 警告并确认而不是中止?

java - 编码jboss 7

mysql - INSERT 值是否使用 SET NAMES、SET CHARACTER SET 编码?

java - 如何使用java将ucs2编码文件转换为UTF-8或UTF-16或ANSI编码格式

c++ - 在基于英语的系统上将 UTF-8 路径转换为宽字符会引发异常