mysql - 将数百万行重新格式化为 CSV 的最快方法

我有一个包含数百万行的文本文件，应该尽快将其导入到 MySQL 表中。据我了解，LOAD DATA 最适合此操作。

数据格式如下，其中括号中的每个大写字母都是一个字符串:

(A)(1-3 tabs)(B)
(3 tabs)(C)
(3 tabs)(D)
(3 tabs)(E)

(F)(1-3 tabs)(G)
(3 tabs)(H)
...

因此需要将数据重新格式化为 CSV，其中每个部分的第一个字符串必须在所有连续行中重复，直到下一部分:

(A)(tab)(B)
(A)(tab)(C)
(A)(tab)(D)
(A)(tab)(E)
(F)(tab)(G)
(F)(tab)(H)
...

我正在考虑编写一个 C 程序，但是 Bash 可以同样快(而且简单)吗？这个问题可能是一个经典问题，有一个非常有效和紧凑的解决方案吗？

最佳答案

尝试这个小 awk 脚本

awk -F\\t+ -v OFS=\\t '$2==""{next}$1!=""{a=$1}{$1=a}1'

假设第二个字段中没有选项卡。

一点一点地看:

-F\\t+        Set the column separator to a sequence of one or more tabs
-v OFS=\\t    Use a tab to separate columns on output
$2==""{next}  Skip this line if it just has one field.
$1!=""{a=$1}  Save the first field if it is specified
{$1=a}        Replace the first field with the saved one.
              The assignment forces the line to be recomputed using OFS
              to separate columns, so it's needed even if we just did a=$1.
1             awk idiom, equivalent to `{print}` (or `{print $0}`).

关于mysql - 将数百万行重新格式化为 CSV 的最快方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39438489/

mysql - 将数百万行重新格式化为 CSV 的最快方法

上一篇：mysql - 带有多个 SELECT 的 SQL 查询

下一篇：mysql - 添加新表时 SQL 查询中断