r - 从 data.frames 列表列表创建数据框

标签 r list dataframe

我有一个 data.frame 列表列表,我想将其转换为 data.frame。结构如下:

l_of_lists <- list(
  year1 = list(
    one = data.frame(date = c("Jan-10", "Jan-22"), type = c("type 1", "type 2")),
    two = data.frame(date = c("Feb-1", "Feb-28"), type = c("type 2", "type 3")),
    three = data.frame(date = c("Mar-10", "Mar-15"), type = c("type 1", "type 4"))
    ),
  year2 = list( # dates is used here on purpose, as the names don't perfectly match
    one = data.frame(dates = c("Jan-22"), type = c("type 2"), another_col = c("entry 2")),
    two = data.frame(date = c("Feb-10", "Feb-18"), type = c("type 2", "type 3"), another_col = c("entry 2", "entry 3")),
    three = data.frame(date = c("Mar-10", "Mar-15"), type = c("type 1", "type 4"), another_col = c("entry 4", "entry 5"))
    ),
  year3 = list( # this deliberately only contains two data frames
    one = data.frame(date = c("Jan-10", "Jan-12"), type = c("type 1", "type 2")),
    two = data.frame(date = c("Feb-8", "Jan-28"), type = c("type 2", "type 3"))
  ))

数据框有两个我试图模仿上面的特点:

  • 列名称相差 1-2 个字符(例如 datedates)
  • 某些列仅出现在某些数据框中(例如 another_col)

我现在想将其转换为数据帧(我尝试了对 rbinddo.call 的不同调用,如 here 所示,但未成功)并且想要 - 宽容地匹配列名(如果列名类似于 1-2 个字符,我希望它们能够匹配并且 - 在其他列中使用 NA 填充不存在的列。

我想要一个类似于以下的数据框

year  level       date        type  another_col                    
   1    one    "Jan-10"    "type 1"           NA
   1    one    "Jan-22"    "type 2"           NA
   1    two     "Feb-1"    "type 2"           NA
   1    two    "Feb-28"    "type 3"           NA
   1  three    "Mar-10"    "type 1"           NA
   1  three    "Mar-15"    "type 4"           NA
   2    one    "Jan-22"    "type 2"     "entry 2"
   2    two     "Feb-1"    "type 2"     "entry 2"
   2    two    "Feb-28"    "type 3"     "entry 3"
   2  three    "Mar-10"    "type 1"     "entry 4"
   2  three    "Mar-15"    "type 4"     "entry 5"
   3    one    "Jan-10"    "type 1"           NA
   3    one    "Jan-12"    "type 2"           NA
   3    two     "Feb-8"    "type 2"           NA
   3    two    "Feb-28"    "type 3"           NA

有人可以指出 rbind 是否是正确的路径 - 以及我缺少什么?

最佳答案

您可以使用 purrr 和 dplyr 执行类似以下操作:

l_of_lists <- list(
  year1 = list(
    one = data.frame(date = c("Jan-10", "Jan-22"), type = c("type 1", "type 2")),
    two = data.frame(date = c("Feb-1", "Feb-28"), type = c("type 2", "type 3")),
    three = data.frame(date = c("Mar-10", "Mar-15"), type = c("type 1", "type 4"))
  ),
  year2 = list( # dates is used here on purpose, as the names don't perfectly match
    one = data.frame(dates = c("Jan-22"), type = c("type 2"), another_col = c("entry 2")),
    two = data.frame(date = c("Feb-10", "Feb-18"), type = c("type 2", "type 3"), another_col = c("entry 2", "entry 3")),
    three = data.frame(date = c("Mar-10", "Mar-15"), type = c("type 1", "type 4"), another_col = c("entry 4", "entry 5"))
  ),
  year3 = list( # this deliberately only contains two data frames
    one = data.frame(date = c("Jan-10", "Jan-12"), type = c("type 1", "type 2")),
    two = data.frame(date = c("Feb-8", "Jan-28"), type = c("type 2", "type 3"))
  ))

# add libraries
library(dplyr)
library(purrr)

# Map bind_rows to each list within the list
l_of_lists %>% 
  map_dfr(~bind_rows(.x, .id = "level"), .id = "year")

这将产生:

     year level   date   type  dates another_col
1  year1   one Jan-10 type 1   <NA>        <NA>
2  year1   one Jan-22 type 2   <NA>        <NA>
3  year1   two  Feb-1 type 2   <NA>        <NA>
4  year1   two Feb-28 type 3   <NA>        <NA>
5  year1 three Mar-10 type 1   <NA>        <NA>
6  year1 three Mar-15 type 4   <NA>        <NA>
7  year2   one   <NA> type 2 Jan-22     entry 2
8  year2   two Feb-10 type 2   <NA>     entry 2
9  year2   two Feb-18 type 3   <NA>     entry 3
10 year2 three Mar-10 type 1   <NA>     entry 4
11 year2 three Mar-15 type 4   <NA>     entry 5
12 year3   one Jan-10 type 1   <NA>        <NA>
13 year3   one Jan-12 type 2   <NA>        <NA>
14 year3   two  Feb-8 type 2   <NA>        <NA>
15 year3   two Jan-28 type 3   <NA>        <NA>

当然,您可以进行一些正则表达式解析以仅保留数字年份:

l_of_lists %>% 
  map_dfr(~bind_rows(.x, .id = "level"), .id = "year") %>% 
  mutate(year = substring(year, regexpr("\\d", year)))

如果您知道日期和日期相同,则始终可以使用 mutate 将 then 更改为未丢失的值(即 mutate(date = ifelse(!is. na(日期),日期,日期)))

关于r - 从 data.frames 列表列表创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58593659/

相关文章:

r - 像 Wolframalpha 网站上那样绘制多项式函数,以便于理解

R ggplot2-帮助复制火车图

python - 如何在 Python 的字典理解中创建值列表

python - 如何从 pandas 的列表中选择元素?

r - 常微分方程 (ODE) - 有什么方法可以防止出现负值吗?

r - 通过附加到新环境来检查 .rdata 文件的内容 - 可能吗?

带有列表的 Python Deque appendleft

python - 如何将列表中的相应元素分别添加到python字典键和值中

python - 计数是否在多个索引数据框中

r - 如何处理 “write.xlsx”错误: arguments imply differing number of rows