r - 如何根据空白行从 df 分区为多个 .csv？

我正在使用一个包含时间戳、3 个数字向量和一个字符向量的数据库。

基本上，每个“数据集”都由一个新行来描述。当行读取每列为空时(x =\t\r\n)，我需要将每个系列的行保存为 .csv。我的数据集中大约有 370 个。

例如，


library(dplyr)

data <- data.frame(x1 = 1:4,
                   x2 = 4:1,
                   x3 = 3,
                   x4 = c("text", "no text", "example", "hello"))

new_row <- c("\t\r\n", "\t\r\n", "\t\r\n", "\t\r\n")

data1 <- rbind(data, new_row)

data2 <- data.frame(x1 = 1:4,
                    x2 = 4:1,
                    x3 = 4,
                    x4 = c("text", "no text", "example", "hello"))

data2 <- rbind(data2, new_row)


data3 <- rbind(data1, data2)

view(data3)

这就是我的数据集的样子(没有时间戳)。我需要将行满或\t\r\n 后的每组连续行导出为单独的 .csv。

我正在做文本分析。每组行的组大小变化很大，代表不同主题的线程。我需要分析这些单独的线程。

执行此操作的最佳方法是什么？我以前没有遇到过这个问题。

最佳答案

ind <- grepl("\t", data3$x4)
ind <- replace(cumsum(ind), ind, -1)
ind
#  [1]  0  0  0  0 -1  1  1  1  1 -1

data4 <- split(data3, ind)
data4
# $`-1`
#       x1    x2    x3    x4
# 5  \t\r\n \t\r\n \t\r\n \t\r\n
# 10 \t\r\n \t\r\n \t\r\n \t\r\n
# $`0`
#   x1 x2 x3      x4
# 1  1  4  3    text
# 2  2  3  3 no text
# 3  3  2  3 example
# 4  4  1  3   hello
# $`1`
#   x1 x2 x3      x4
# 6  1  4  4    text
# 7  2  3  4 no text
# 8  3  2  4 example
# 9  4  1  4   hello

使用 -1 只是为了防止 "\t\r\n" 行包含在各自的组中，我们知道cumsum(ind) 应从 0 开始。您显然可以删除第一帧:-)

从这里，您可以使用导出

data4 <- data4[-1]
ign <- Map(write.csv, data4, sprintf("file_%03d.csv", seq_along(data4)))

关于r - 如何根据空白行从 df 分区为多个 .csv？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66356953/

r - 如何根据空白行从 df 分区为多个 .csv？

上一篇：r - 如何在R中的树状图中旋转ylab标签？

下一篇：Python For 循环 : Optimize speed of code when replacing cat code with original values