r - 根据R中目标值的下限和上限连接两个数据帧

标签 r dataframe join match

我有两个数据框,df1df2。我想以某种方式加入这两个,我将目标值从 df2 添加到 df1df1df2 通过列组和值关联。在 df1 中,我有一个特定的值,在 df2 中,我只有适用值的下限和上限。

如果我们查看 df1df2,我相信任务应该很清楚。

df1 <- data.frame(group = c("A","B","C","D"),
                  value = c(15, 0, 40, 70))

df2 <- data.frame(group = c("A","A","A","A",
                            "B","B","B","B",
                            "C","C","C","C",
                            "D","D","D","D"),
                  lower_limit = c(0, 25, 60, 91,
                                  0, 35, 70, 92,
                                  0, 45, 80, 93,
                                  0, 55, 90, 94),
                  upper_limit = c(25, 60, 91, 100, 
                                  35, 70, 92, 100, 
                                  45, 80, 93, 100, 
                                  55, 90, 94, 100),
                  target = c("AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4"))

使用嵌套的 for 和 if 循环,我可以执行此任务。但是我的原始数据大得多,我不能使用这个循环。我确信我的任务有一个更简单的解决方案。有什么建议吗?

for (i in 1:nrow(df1)){
  subset_string = df1[i, 1]
  target_value = df1[i, 2]

  df2_subset <- df2[df2$group == subset_string, ]

  for (j in 1:nrow(df2_subset)){

    temp_sequence <- seq(from = df2_subset[j, 2], to = df2_subset[j, 3] - 1)
    if  (target_value %in% temp_sequence){
      target_string <- df2_subset[j, 4]
    }

    df1[i, 3] <- target_string
  }
}

最佳答案

不确定想要的结果。也许与 sdqldf:

df1 <- data.frame(group = c("A","B","C","D"),
                  value = c(15, 0, 40, 70))

df2 <- data.frame(group = c("A","A","A","A",
                            "B","B","B","B",
                            "C","C","C","C",
                            "D","D","D","D"),
                  lower_limit = c(0, 25, 60, 91,
                                  0, 35, 70, 92,
                                  0, 45, 80, 93,
                                  0, 55, 90, 94),
                  upper_limit = c(25, 60, 91, 100, 
                                  35, 70, 92, 100, 
                                  45, 80, 93, 100, 
                                  55, 90, 94, 100),
                  target = c("AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4",
                             "AGE0", "AGE1", "AGE3", "AGE4"))

library(sqldf)
sqldf("select a.*, b.target
         from df1 a
         left join df2 b
           on a.`group` = b.`group`
             AND a.value >= b.lower_limit 
             AND a.value <= b.upper_limit")

# group value target
#1     A    15   AGE0
#2     B     0   AGE0
#3     C    40   AGE0
#4     D    70   AGE1

关于r - 根据R中目标值的下限和上限连接两个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51514981/

相关文章:

r - 基于向量子集索引数据帧

r - 如何在R中设置行索引名称? (就像 Pandas 中的 DF.index.name)

python - 从数据框中获取索引作为日期时间对象

MYSQL查询从状态表中获取多个状态

r - 在不打扰用户的情况下将 R 包拆分为两个包

java - 将一个类连接到主类

python - 对 Pandas DataFrame 中的 bool 值列进行排序

python - 查找 pandas Dataframe 列的唯一行,其中第二列的所有值都是 NaN

join - 将两个 splunk 查询的结果显示为一个

php - 是否可以使用 Codeigniter Active Record 更新 JOINed 表?