#问题# 我有 2 个数据框。 1 个数据框 (A) 具有多个列。数据帧 A 中的第 1 列有一个电子邮件地址,其中多行具有相同的电子邮件地址。另一个数据框 (B) 在第 1 列中有一个唯一电子邮件地址列表,在第 2 列中有该电子邮件在数据框 A 的列表中出现的次数。我本质上想做一个 vlookup,以便无论电子邮件地址在哪里匹配从这两个表中,它会将计数拉入数据框 A 的新列中。任何人都可以帮忙吗?
数据
Table A
Column 1 Column 2 Column 3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="563716377835393b" rel="noreferrer noopener nofollow">[email protected]</a> home 123
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9afbdafbb4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a> house 456
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c9ab89abe7aaa6a4" rel="noreferrer noopener nofollow">[email protected]</a> tree 221
Table B
Column 1 Column 2(Count)
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="58391839763b3735" rel="noreferrer noopener nofollow">[email protected]</a> 2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bedcfedc90ddd1d3" rel="noreferrer noopener nofollow">[email protected]</a> 1
Expected result should be Table A with an additional column:
Column 1 Column 2 Column 3 Column 4
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe9fbe9fd09d9193" rel="noreferrer noopener nofollow">[email protected]</a> home 123 2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="97f6d7f6b9f4f8fa" rel="noreferrer noopener nofollow">[email protected]</a> house 456 2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ccae8caee2afa3a1" rel="noreferrer noopener nofollow">[email protected]</a> tree 221 1
最佳答案
您不需要 df2 来获取计数。您可以单独使用 df1 来获取计数:
#solution using data.table package
library(data.table)
setDT(df1)[,count:=.N,by=Column1]
Column1 Column2 Column3 count
1: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e988a988c78a8684" rel="noreferrer noopener nofollow">[email protected]</a> home 123 2
2: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d4b594b5fab7bbb9" rel="noreferrer noopener nofollow">[email protected]</a> house 456 2
3: <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="abc9ebc985c8c4c6" rel="noreferrer noopener nofollow">[email protected]</a> tree 221 1
#solution using dplyr package
library(dplyr)
df1 %>%
group_by(Column1)%>%
mutate(count=n())
Source: local data frame [3 x 4]
Groups: Column1
Column1 Column2 Column3 count
1 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c1a081a0efa2aeac" rel="noreferrer noopener nofollow">[email protected]</a> home 123 2
2 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="224362430c414d4f" rel="noreferrer noopener nofollow">[email protected]</a> house 456 2
3 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a5c7e5c78bc6cac8" rel="noreferrer noopener nofollow">[email protected]</a> tree 221 1
#Data
df1<-structure(list(Column1 = structure(c(1L, 1L, 2L), .Label = c("<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bedffedf90ddd1d3" rel="noreferrer noopener nofollow">[email protected]</a>",
"<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="debc9ebcf0bdb1b3" rel="noreferrer noopener nofollow">[email protected]</a>"), class = "factor"), Column2 = structure(1:3, .Label = c("home",
"house", "tree"), class = "factor"), Column3 = c(123L, 456L,
221L)), .Names = c("Column1", "Column2", "Column3"), class = "data.frame", row.names = c(NA,
-3L))
关于R 如果表 a 中第 1 列中的值与表 b 中第 1 列中的值匹配,则将表 b 中第 2 列中的值复制到表 1,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30329536/