r - 使用连接值拆分数据框行

标签 r split dataframe

我有一个如下所示的 data.frame:

df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))

#   col1      col2      col3
# 1    a         1         0
# 2    b     1;2;3     1;1;0
# 3    c         5         0
# 4    d 3;2;5;5;3 0;0;1;1;0

换言之,某些行的列中的值由“;”连接。在读取 data.frame 之前,我不知道哪些列将包含连接值,但我确实知道对于所有具有该值的行,它们将是相同的。我还知道,对于包含具有串联值的列的行,所有这些列中的串联值的数量是相同的(第 2 行在 col2 和 col3 中都有 3 个值,第 4 行在这些列中有 5 个值)

我想创建一个新的 data.frame,其中这些连接的值被拆分为单独的行。对于这些行,没有连接值的列中的值应按连接值的数量进行复制。

生成的 data.frame 将是:

df <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))

#    col1 col2 col3
# 1     a    1    0
# 2     b    1    1
# 3     b    2    1
# 4     b    3    0
# 5     c    5    0
# 6     d    3    0
# 7     d    2    0
# 8     d    5    1
# 9     d    5    1
# 10    d    3    0

最佳答案

这里有一个选择

df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))

df2 <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))


## reshape `col1` to make it look like the others
v <- Vectorize(gsub)
df$col1 <- v('\\b\\d\\b', df$col1, df$col2)

#        col1      col2      col3
# 1         a         1         0
# 2     b;b;b     1;2;3     1;1;0
# 3         c         5         0
# 4 d;d;d;d;d 3;2;5;5;3 0;0;1;1;0


## split on white space or `;` and coerce back into a data frame
data.frame(do.call('cbind', lapply(df, function(x)
  unlist(strsplit(as.character(x), '[\\s;]')))))

#    col1 col2 col3
# 1     a    1    0
# 2     b    1    1
# 3     b    2    1
# 4     b    3    0
# 5     c    5    0
# 6     d    3    0
# 7     d    2    0
# 8     d    5    1
# 9     d    5    1
# 10    d    3    0

关于r - 使用连接值拆分数据框行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29480537/

相关文章:

从 R 中的 Boxplot() 函数中删除框架

javascript - 当我在字符串上使用 .split 和 .length 来查找某个字符在字符串中出现的次数时,为什么输出数字总是少一?

pandas - 如何删除 Dataframe.plot 中的特定图例标签?

r 两个数据帧按一列的绝对值合并

r - 如何在 R 中对 'user-defined' 公式进行线性回归?

r - 如何在没有仪表板结构的 ShinyApp 中利用 valuebox?

r - 转换年/周至今的对象

php - PHP 中的 "Function split() is deprecated"?

swift - 什么更快/应该用于短(ish)字符串 : Split or Substring?

python - 合并 pandas dataframe(concat 或 append)时,我可以设置默认值吗?