r - 使用多个数据帧和查找表来执行 r 中的功能

我是 r 新手，有一组复杂的数据，所以希望我的解释是正确的。我需要使用多个数据框来执行一系列操作。这是一个例子。我有三个数据框。一是物种名称和相应代码的列表:

>df.sp
    Species Code
    Picea   PI
    Pinus   CA

另一个是包含不同位置(dir)物种丰度数据的站点列表。不幸的是，物种的顺序不同。

>df.site
Site  dir total  t01 t02 t03 t04
2         Total   PI  CA  AB  T
2     N    9      1   5   na na
2                 AB  ZI PI CA
2     S    5     2   2  1  4
3                 DD  EE AB YT
3     N    6     1   1  5   3
3                 AB YT  EE  DD
3     S     5     4   3  1   1

然后我还有一个与物种相对应的特征数据框:

>df.trait
Species  leaft  rootl
Picea     0.01    1.2
Pinus     0.02    3.5

我想做的一件事的一个例子是获取每个站点(df.site$Site)和每个站点位置的所有物种的每个性状(df.trait$leaf 和 df.trait$rootl)的平均值(df.site$站点 N, S)。所以结果将是第一行:

Site dir leaft rootl
2    N   0.015  2.35

我希望这是有道理的。对于我来说，思考如何去做是非常复杂的。我尝试从 this post 开始工作和 this (和许多其他人)但迷路了。谢谢您的帮助。真的很感激。

更新:这是使用 dput 的实际 df.site(简化)的示例:

> dput(head(df.site))
structure(list(Site = c(2L, 2L, 2L, 2L, 2L, 2L), dir = c("rep17316", 
"N", "", "S", "", "SE"), total = c("Total", "9", "", 
"10", "", "9"), t01 = c("PI", "4", "CA", "1", "SILLAC", 
"3"), t02 = c("CXBLAN", "3", "ZIZAUR", "4", "OENPIL", "2"), 
    t03 = c("ZIZAPT", "1", "ECHPUR", "2", "ASCSYR", "2")), .Names = c("site", "dir", "total", "t01", "t02", "t03"), row.names = 2:7, class = "data.frame")

最佳答案

您必须首先将数据整理成更清晰的形式。我假设您上面 dput 的结构在整个 df.site 数据帧中是一致的；即行是配对的，第一行指定物种代码，第二行有计数(或其他一些收集的数据？)。

从 df 作为上面 dput() 的数据帧开始，我将首先模拟其他两个数据帧的一些数据:

df.sp <- data.frame(Species = paste0("species",1:8),
                    Code = c("ECHPUR", "CXBLAN", "ZIZAPT",
                             "CAMROT", "SILLAC", "OENPIL",
                             "ASCSYR", "ZIZAUR"))
df.sp
#>    Species   Code
#> 1 species1 ECHPUR
#> 2 species2 CXBLAN
#> 3 species3 ZIZAPT
#> 4 species4 CAMROT
#> 5 species5 SILLAC
#> 6 species6 OENPIL
#> 7 species7 ASCSYR
#> 8 species8 ZIZAUR

df.trait <- data.frame(Species = paste0("species",1:8),
                       leaft = round(runif(8, max=.2), 2),
                       rootl = round(runif(8, min=1, max=4),1))

df.trait
#>    Species leaft rootl
#> 1 species1  0.12   2.5
#> 2 species2  0.04   2.6
#> 3 species3  0.12   2.1
#> 4 species4  0.05   1.1
#> 5 species5  0.15   2.5
#> 6 species6  0.15   3.3
#> 7 species7  0.05   3.9
#> 8 species8  0.13   2.1

首先，让我们通过移动包含收集数据的第二行并将这些值移动到一组新的列来清理 df:

library(dplyr)

df.clean <- df %>% 
  #for each row, copy the direction and total from the following row
  mutate_at(vars(matches("dir|total")), lead) %>% 
  #create new columns for observed data and fill in values from following row
  mutate_at(vars(matches("t\\d+$")), 
            .funs = funs(n = lead(.))) %>% 
  #filter to rows with species code in t01
  filter(t01 %in% df.sp$Code) %>% 
  #drop "total" column (doesn't make sense after reshape)
  select(-total)

df.clean
#>   site dir    t01    t02    t03 t01_n t02_n t03_n
#> 1    2   N ECHPUR CXBLAN ZIZAPT     4     3     1
#> 2    2   S CAMROT ZIZAUR ECHPUR     1     4     2
#> 3    2  SE SILLAC OENPIL ASCSYR     3     2     2

我们现在有两组相应的列，分别具有物种代码和值。要将数据帧 reshape 为长格式，我们将使用 data.table 包中的 melt() 。查看对 this question 的回复有关如何执行此操作的其他示例。

library(data.table)

df.clean <- df.clean %>% 
  setDT() %>% #convert to data.table to use data.tabel::melt
  melt(measure.vars = patterns("t\\d+$", "_n$"),
       value.name = c("Code", "Count") ) %>% 
  #drop "variable" column, which isn't needed
  select(-variable)

最后，加入您的三个数据框:

#merge tables together
df.summaries <- df.clean %>% 
  left_join(df.sp) %>% 
  left_join(df.trait)

此时，您应该能够使用您感兴趣的任何分组来总结数据 group_by和 summarise .

关于r - 使用多个数据帧和查找表来执行 r 中的功能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54642358/

r - 使用多个数据帧和查找表来执行 r 中的功能

上一篇：php - 允许英文字符、中文、日文

下一篇：python-3.x - 可选类型注释。检查是否为“无”后使用值？