r - 如果其他两列中的值的组合是唯一的，则对变量求和

这个问题在这里已经有了答案:

Aggregate a data frame based on unordered pairs of columns

(2 个回答)

Sorting rows alphabetically

(4 个回答)

3年前关闭。

我有发送者和接收者的数据，以及发送的电子邮件数量。一个玩具示例:

senders <- c("Mable","Beth", "Beth","Susan","Susan")
receivers <- c("Beth", "Mable", "Susan", "Mable","Beth")
num_email <- c(1,1,2,1,1)

df <- data.frame(senders, receivers, num_email)

senders receivers num_email
Mable      Beth          1
Beth       Mable         1
Beth       Susan         2
Susan      Mable         1
Susan      Beth          1

我想获得一个 data.frame，其中包含每个唯一对的总消息。例如。连接 Mable | Beth 的值为 2，因为 Mable 向 Beth 发送了一条消息，而 Beth 向 Mable 发送了一条消息。对于每个唯一的电子邮件发送者组合，生成的 data.frame 应该只有一行(例如，只有 Mable | Beth 或 Beth | Mable，而不是两者。

我已经尝试了 reshape 和 data.table 的各种方法，但我没有任何运气。我想避免创建一个唯一的字符串 BethMable 并以这种方式合并。非常感谢

最佳答案

我们可以使用 base R 方法，首先 sort 逐行的前两列。我们使用 apply 和 MARGIN=1 来做到这一点，转置输出，转换为 'data.frame' 以创建 'df1'，使用 aggregate 的公式方法得到由转换后的数据集的前两列 'num_email' 的 sum .

df1 <- data.frame(t(apply(df[1:2], 1, sort)), df[3])
aggregate(num_email~., df1, FUN=sum)

#      X1    X2 num_email
# 1  Beth Mable         2
# 2  Beth Susan         3
# 3 Mable Susan         1

或者使用 data.table ，我们将前两列转换为 character 类，unname 将前两列的列名更改为默认的“V1”、“V2”，并转换为“data.table”。使用字符列的字典序，我们为i( V1 > V2 )创建逻辑索引，通过反转列的顺序( := )分配( .(V2, V1) )满足条件的列，并得到组的sum dplyr V1'，'V2'。

library(data.table)
dt = do.call(data.table, c(lapply(unname(df[1:2]), as.character), df[3]))
dt[V1 > V2, c("V1", "V2") := .(V2, V1)]
dt[, .(num_email = sum(num_email)), by= .(V1, V2)]

#       V1    V2 num_email
# 1:  Beth Mable         2
# 2:  Beth Susan         3
# 3: Mable Susan         1

或者使用 mutate_each ，我们使用 character 将列转换为 pmin 类，然后将 pmax 和 sum 的顺序颠倒，按 'V1'、'V2' 和 '0x1914' 分组得到 '0x7914 的 6 个邮件

library(dplyr)
df %>%
  mutate_each(funs(as.character), senders, receivers) %>%
  mutate( V1 = pmin(senders, receivers), 
          V2 = pmax(senders, receivers) ) %>%
  group_by(V1, V2) %>%
  summarise(num_email=sum(num_email))

#      V1    V2 num_email
#   (chr) (chr)     (dbl)
# 1  Beth Mable         2
# 2  Beth Susan         3
# 3 Mable Susan         1

注意:data.table 解决方案由@Frank 更新。

关于r - 如果其他两列中的值的组合是唯一的，则对变量求和，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28360148/

r - 如果其他两列中的值的组合是唯一的，则对变量求和

上一篇：Haskell 编译指示 : OPTIONS_GHC vs LANGUAGE

下一篇：r - 词条文档熵计算