我有两个单独的数据框列表:
示例数据
#Example columns
Label <- c("Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009","Red_001_Series008", "Red_001_Series008","Red_001_Series008","Red_001_Series008","Blue_002_Series009", "Blue_002_Series009","Blue_002_Series009","Blue_002_Series009")
Pred <- c("Pear", "Orange", "Apple", "Peach", "Pear", "Orange", "Apple", "Peach", "Pear", "Orange", "Apple", "Peach")
n <- c(10, 223, 890, 34, 78, 902, 34, 211, 1007,209, 330, 446)
#make example data frame
data <- data.frame(Label, Pred, n)
#Split dataframe into a list of dataframes
df <- split(data, f = data$Label)
#Second dataframe example columns
Label1 <- c("Red_001_Series008","Blue_001_Series009", "Blue_002_Series009")
TotalArea <- c(1904, 578, 7092)
#Make dataframe
data1 <- data.frame(Label1, TotalArea)
#Split dataframe into a list of dataframes
df1 <- split(data1, f = data1$Label1)
问题
每个数据帧列表都包含相同的标签,但它们的顺序不同。我愿意。
根据标签将 df 与 df1 匹配
根据标签将 df 中的
n
列除以 df1 中的TotalArea
列。
例如。 df 的片段:
Label Pred n
1 Blue_001_Series009 Pear 10
2 Blue_001_Series009 Orange 223
3 Blue_001_Series009 Apple 890
4 Blue_001_Series009 Peach 34
df1 的片段:
Label1 TotalArea
2 Blue_001_Series009 578
我想得到:
Blue_001_Series009 Pear / Blue_001_Series009 TotalArea
10 / 578 = 0.0173
Blue_001_Series009 Orange / Blue_001_Series009 TotalArea
223 / 578 = 0.3858
等等...
这必须发生在每个数据帧列表中的每个匹配的数据帧中。实际上,我的列表中有数百个数据帧,因此必须能够处理大量数据。
我在网上找不到任何类似的东西,并且发现很难使用数据框列表。
最佳答案
您可以使用Map
,它是lapply
的多元版本。
Map(\(x, y) x$n/y$TotalArea, df, df1[names(df)])
# $Blue_001_Series009
# [1] 0.01730104 0.38581315 1.53979239 0.05882353
#
# $Blue_002_Series009
# [1] 0.14199098 0.02946983 0.04653130 0.06288776
#
# $Red_001_Series008
# [1] 0.04096639 0.47373950 0.01785714 0.11081933
案例,您想将其作为新列添加到df
:
Map(\(x, y) {x$n2 <- x$n/y$TotalArea; x}, df, df1[names(df)])
# $Blue_001_Series009
# Label Pred n n2
# 1 Blue_001_Series009 Pear 10 0.01730104
# 2 Blue_001_Series009 Orange 223 0.38581315
# 3 Blue_001_Series009 Apple 890 1.53979239
# 4 Blue_001_Series009 Peach 34 0.05882353
#
# $Blue_002_Series009
# Label Pred n n2
# 9 Blue_002_Series009 Pear 1007 0.14199098
# 10 Blue_002_Series009 Orange 209 0.02946983
# 11 Blue_002_Series009 Apple 330 0.04653130
# 12 Blue_002_Series009 Peach 446 0.06288776
#
# $Red_001_Series008
# Label Pred n n2
# 5 Red_001_Series008 Pear 78 0.04096639
# 6 Red_001_Series008 Orange 902 0.47373950
# 7 Red_001_Series008 Apple 34 0.01785714
# 8 Red_001_Series008 Peach 211 0.11081933
请注意,names
用于按 df
对 df1
进行排序,如果喜欢,您也可以将其翻转,即 Map( ..., df[名称(df1)], df1)
。
数据:
df <- list(Blue_001_Series009 = structure(list(Label = c("Blue_001_Series009",
"Blue_001_Series009", "Blue_001_Series009", "Blue_001_Series009"
), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(10, 223,
890, 34)), row.names = c(NA, 4L), class = "data.frame"), Blue_002_Series009 = structure(list(
Label = c("Blue_002_Series009", "Blue_002_Series009", "Blue_002_Series009",
"Blue_002_Series009"), Pred = c("Pear", "Orange", "Apple",
"Peach"), n = c(1007, 209, 330, 446)), row.names = 9:12, class = "data.frame"),
Red_001_Series008 = structure(list(Label = c("Red_001_Series008",
"Red_001_Series008", "Red_001_Series008", "Red_001_Series008"
), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(78,
902, 34, 211)), row.names = 5:8, class = "data.frame"))
df1 <- list(Blue_001_Series009 = structure(list(Label1 = "Blue_001_Series009",
TotalArea = 578), row.names = 2L, class = "data.frame"),
Blue_002_Series009 = structure(list(Label1 = "Blue_002_Series009",
TotalArea = 7092), row.names = 3L, class = "data.frame"),
Red_001_Series008 = structure(list(Label1 = "Red_001_Series008",
TotalArea = 1904), row.names = 1L, class = "data.frame"))
关于r - 使用一个相似的列对两个数据框列表执行函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74871804/