我有一堆竖线分隔的文件,生成时没有为回车符正确转义,所以我不能使用 CR 或换行符来分隔行。但是我知道每条记录必须正好有 7 个字段。
使用 Ruby 1.9 中的 CSV 库设置“col_sep”参数可以轻松拆分字段,但无法设置“row_sep”参数,因为字段中有换行符。
有没有办法使用固定数量的字段作为行分隔符来解析管道分隔文件?
谢谢!
最佳答案
这是一种实现方式:
构建一个包含七个单词的示例字符串,其中嵌入了换行符 字符串的中间。值三行。
text = (["now is the\ntime for all good"] * 3).join(' ').gsub(' ', '|')
puts text
# >> now|is|the
# >> time|for|all|good|now|is|the
# >> time|for|all|good|now|is|the
# >> time|for|all|good
过程是这样的:
lines = []
chunks = text.gsub("\n", '|').split('|')
while (chunks.any?)
lines << chunks.slice!(0, 7).join(' ')
end
puts lines
# >> now is the time for all good
# >> now is the time for all good
# >> now is the time for all good
所以,这表明我们可以重建行。
假装这些词实际上是管道分隔文件中的列,我们可以通过取出 .join(' ')
让代码做真实的事情:
while (chunks.any?)
lines << chunks.slice!(0, 7)
end
ap lines
# >> [
# >> [0] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ],
# >> [1] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ],
# >> [2] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ]
# >> ]
关于ruby - 每行读取固定数量的管道分隔字段?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4083690/