r - 使用一个相似的列对两个数据框列表执行函数

标签 r list dataframe function

我有两个单独的数据框列表:

示例数据

#Example columns
Label <- c("Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009","Red_001_Series008", "Red_001_Series008","Red_001_Series008","Red_001_Series008","Blue_002_Series009", "Blue_002_Series009","Blue_002_Series009","Blue_002_Series009")
Pred <- c("Pear", "Orange", "Apple", "Peach", "Pear", "Orange", "Apple", "Peach", "Pear", "Orange", "Apple", "Peach")
n <- c(10, 223, 890, 34, 78, 902, 34, 211, 1007,209, 330, 446)

#make example data frame
data <- data.frame(Label, Pred, n)

#Split dataframe into a list of dataframes
df <- split(data, f = data$Label)  


#Second dataframe example columns

Label1 <- c("Red_001_Series008","Blue_001_Series009", "Blue_002_Series009")
TotalArea <- c(1904, 578, 7092)

#Make dataframe
data1 <- data.frame(Label1, TotalArea)
#Split dataframe into a list of dataframes
df1 <- split(data1, f = data1$Label1)   

问题

每个数据帧列表都包含相同的标签,但它们的顺序不同。我愿意。

  1. 根据标签将 df 与 df1 匹配

  2. 根据标签将 df 中的 n 列除以 df1 中的 TotalArea 列。

例如。 df 的片段:

Label                  Pred   n
1 Blue_001_Series009   Pear  10
2 Blue_001_Series009 Orange 223
3 Blue_001_Series009  Apple 890
4 Blue_001_Series009  Peach  34

df1 的片段:

             Label1      TotalArea
2   Blue_001_Series009       578

我想得到:

Blue_001_Series009 Pear / Blue_001_Series009 TotalArea
10 / 578 = 0.0173

Blue_001_Series009 Orange / Blue_001_Series009 TotalArea
223 / 578 = 0.3858

等等...

这必须发生在每个数据帧列表中的每个匹配的数据帧中。实际上,我的列表中有数百个数据帧,因此必须能够处理大量数据。

我在网上找不到任何类似的东西,并且发现很难使用数据框列表。

最佳答案

您可以使用Map,它是lapply的多元版本。

Map(\(x, y) x$n/y$TotalArea, df, df1[names(df)])
# $Blue_001_Series009
# [1] 0.01730104 0.38581315 1.53979239 0.05882353
# 
# $Blue_002_Series009
# [1] 0.14199098 0.02946983 0.04653130 0.06288776
# 
# $Red_001_Series008
# [1] 0.04096639 0.47373950 0.01785714 0.11081933

案例,您想将其作为新列添加到df:

Map(\(x, y) {x$n2 <- x$n/y$TotalArea; x}, df, df1[names(df)])
# $Blue_001_Series009
#                Label   Pred   n         n2
# 1 Blue_001_Series009   Pear  10 0.01730104
# 2 Blue_001_Series009 Orange 223 0.38581315
# 3 Blue_001_Series009  Apple 890 1.53979239
# 4 Blue_001_Series009  Peach  34 0.05882353
# 
# $Blue_002_Series009
#                 Label   Pred    n         n2
# 9  Blue_002_Series009   Pear 1007 0.14199098
# 10 Blue_002_Series009 Orange  209 0.02946983
# 11 Blue_002_Series009  Apple  330 0.04653130
# 12 Blue_002_Series009  Peach  446 0.06288776
# 
# $Red_001_Series008
#               Label   Pred   n         n2
# 5 Red_001_Series008   Pear  78 0.04096639
# 6 Red_001_Series008 Orange 902 0.47373950
# 7 Red_001_Series008  Apple  34 0.01785714
# 8 Red_001_Series008  Peach 211 0.11081933

请注意,names 用于按 dfdf1 进行排序,如果喜欢,您也可以将其翻转,即 Map( ..., df[名称(df1)], df1)


数据:

df <- list(Blue_001_Series009 = structure(list(Label = c("Blue_001_Series009", 
"Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009"
), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(10, 223, 
890, 34)), row.names = c(NA, 4L), class = "data.frame"), Blue_002_Series009 = structure(list(
    Label = c("Blue_002_Series009", "Blue_002_Series009", "Blue_002_Series009", 
    "Blue_002_Series009"), Pred = c("Pear", "Orange", "Apple", 
    "Peach"), n = c(1007, 209, 330, 446)), row.names = 9:12, class = "data.frame"), 
    Red_001_Series008 = structure(list(Label = c("Red_001_Series008", 
    "Red_001_Series008", "Red_001_Series008", "Red_001_Series008"
    ), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(78, 
    902, 34, 211)), row.names = 5:8, class = "data.frame"))


df1 <- list(Blue_001_Series009 = structure(list(Label1 = "Blue_001_Series009", 
    TotalArea = 578), row.names = 2L, class = "data.frame"), 
    Blue_002_Series009 = structure(list(Label1 = "Blue_002_Series009", 
        TotalArea = 7092), row.names = 3L, class = "data.frame"), 
    Red_001_Series008 = structure(list(Label1 = "Red_001_Series008", 
        TotalArea = 1904), row.names = 1L, class = "data.frame"))

关于r - 使用一个相似的列对两个数据框列表执行函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74871804/

相关文章:

python - 如果列与特定值匹配,如何在 Pandas Dataframe 中创建虚拟变量?

r - R如何处理不存在的索引值?

javascript - 如何在 R Shiny 中查询元素

重构任意嵌套列表

r - 在 R 中的光谱上对绘图点进行着色

python - 在 Python 中创建整数范围列表的最有效且可读的方法是什么?

scala - Spark Dataframe 以 avro 格式写入 kafka 主题?

python - 修改 Python 矩阵算法中的列表

python - 将 dict 中的列表作为值更改为 dict 中的正常值

r - 如何使用 dplyr 在行组之间进行划分?