我发现 Ruby 1.9.3 中的 CSV 解析非常脆弱。以至于我想知道我是否做错了什么
如果我在 irb 中执行以下操作,则会收到错误:
1.9.3-p125 :011 > require 'csv'
=> true
1.9.3-p125 :012 > a = 'one,two,three, "four, five",six'
=> "one,two,three, \"four, five\",six"
1.9.3-p125 :013 > arr = CSV.parse(a)
CSV::MalformedCSVError: Illegal quoting in line 1.
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `each'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `read'
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1379:in `parse'
from (irb):13
from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
我发现问题在于“四、五”值之前的额外空格。如果我删除空格,那么它就可以工作。
1.9.3-p125 :010 > a = 'one,two,three,"four, five",six'
=> "one,two,three,\"four, five\",six"
1.9.3-p125 :011 > arr = CSV.parse(a)
=> [["one", "two", "three", "four, five", "six"]]
其他值前面的空格不会导致问题。下面的解析就很好
one, two, three,"four, five", six
我是否缺少一些解析选项,导致使用带引号的值如此脆弱?
最佳答案
这是正确的行为。它并不脆弱。
“四”后面的逗号将结束该字段,下一个字段将立即以空格开始。
您无法在字段中间有效地放置引号(无需转义)。
关于Ruby CSV.parse 在遇到引号时非常挑剔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9965838/