bash - 更改/修改 CSV 分隔符和定界符的 sed 语句

标签 bash shell csv sed

我有一些包含逗号分隔值的 CSV 文件,一些列值可以包含像 ,.<>!/\;& 这样的字符

我正在尝试将 CSV 转换为逗号分隔、引号括起来的 CSV

示例数据:

DateCreated,DateModified,SKU,Name,Category,Description,Url,OriginalUrl,Image,Image50,Image100,Image120,Image200,Image300,Image400,Price,Brand,ModelNumber
2012-10-19 10:52:50,2013-06-11 02:07:16,34,Austral Foldaway 45 Rotary Clothesline,Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers,"Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/Products/7228/1718695.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg,309.9000 AUD,Austral,FA45GR

我想要实现的输出是

"DateCreated","DateModified","SKU","Name","Category","Description","Url","OriginalUrl","Image","Image50","Image100","Image120","Image200","Image300","Image400","Price","Brand","ModelNumber"
"2012-10-19 10:52:50","2013-06-11 02:07:16","34","Austral Foldaway 45 Rotary Clothesline","Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers","Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/Products/7228/1718695.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg","309.9000 AUD","Austral","FA45GR"

非常感谢任何帮助。

最佳答案

首先,让我们尝试简单的(而且“不够好”)解决方案,它只是为每个字段添加一个双引号(包括那些已经有双引号的字段!这不是您想要的)

sed -r 's/([^,]*)/"\1"/g'

很好,第一部分查找其中没有逗号的序列,第二部分在它们周围添加双引号,最后的 'g' 表示每行执行多次

这会变成

abc,345, some words ,"some text","text,with,commas"

进入 "abc","345","some words",""some text",""text","with","commas""

一些注意事项:

  • 它正确地用空格包围了“一些单词”,但也包围了初始和最终空格。我认为这没问题,但如果不行,也可以修复

  • 如果该字段已经有引号,它将再次被引号,这是不好的。需要修复

  • 如果字段已经有引号并且内部文本有逗号(不应被视为字段分隔符),这些逗号也会被引用。这也需要解决

所以我们想要匹配两个不同的正则表达式——要么有一个带引号的字符串,要么一个没有逗号的字段:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g'

现在的结果是

"abc","345"," some words ",""some text"",""text,with,commas""

如您所见,我们在最初引用的文字上加了双引号。我们必须使用第二个 sed 命令删除它:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g'

结果

"abc","345"," some words ","some text","text,with,commas"

耶!

关于bash - 更改/修改 CSV 分隔符和定界符的 sed 语句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17825739/

相关文章:

bash - 如何将字符串转换为 Bash 中单个字符的数组?

c++ - 对数组中的字符串进行排序,使其稀疏

regex - 如何使用十六进制表示来查找特殊字符(控制字符)

linux - 在 shell 脚本中使用 cd 和 cd - 命令在另一个目录中进行某些计算是一个好方法吗

python - 适用于 Python 的 MySQL : Incorrect Integer value

ruby-on-rails - Rails 3,导入前检查 CSV 文件编码

java - 在java中读取csv文件时跳行

linux - TCP 连接,仅限 bash

bash - 在 Bash Shell 脚本中处理子字符串搜索中的空格

ruby - 是否有与 Windows shell 的 Ruby Shellwords 模块等效的模块?