r - 字符串匹配不同大小的 data.frames

标签 r string dataframe

我有两个不同大小的 data.frame,我正在寻找最有效的方法来将字符串从一个 data.frame 匹配到另一个 data.frame,并提取一些相关信息。

这是一个例子:

两个初始数据帧,a 和 b,以及所需的结果:

a = data.frame(term = c("red", "salad", "rope", "ball", "tent", "plane", "gift", "meat"),
               age = c(30, 24, 52, 44, 73, 44, 33, 12),
               visits = c(5, 1, 3, 2, 8, 5, 19, 3))

b = data.frame(string = c("the red ball went over the fence",
                          "sorry to see that your tent fell down",
                          "the ball fell into the red salad",
                          "serious people eat peanuts on Sundays"))

desired_result = data.frame(string = b$string,
                            num_matches = c(2, 1, 3, 0),
                            avg_age = c(37, 73, 32.66667, NA),
                            avg_visits = c(3.5, 8, 2.66667, NA))

以下是更易读格式的 data.frames:

> a
   term age visits
1   red  30      5
2 salad  24      1
3  rope  52      3
4  ball  44      2
5  tent  73      8
6 plane  44      5
7  gift  33     19
8  meat  12      3

> b
                                 string
1      the red ball went over the fence
2 sorry to see that your tent fell down
3      the ball fell into the red salad
4 serious people eat peanuts on Sundays

> desired_result
                                 string num_matches  avg_age avg_visits
1      the red ball went over the fence           2 37.00000    3.50000
2 sorry to see that your tent fell down           1 73.00000    8.00000
3      the ball fell into the red salad           3 32.66667    2.66667
4 serious people eat peanuts on Sundays           0       NA         NA
  • num_matches 是“string”中“term”的数量
  • avg_age 是“string”中找到的“term”的平均年龄
  • avg_visits 是“string”中找到的“term”的平均访问次数

关于如何有效地实现这一点有什么想法吗?

谢谢。

最佳答案

您可以使用基础 R 尝试此操作(不需要软件包):

res <- t(apply(b, 1, function(x) {
    l <- strsplit(x, " ")
    r <- unlist(lapply(unlist(l), function(y) which(a$term==y)))
    rbind(length(r), mean(a$age[r]), mean(a$visits[r]))

}))

res <- cbind(b, res)
                                 # string 1        2        3
# 1      the red ball went over the fence 2 37.00000 3.500000
# 2 sorry to see that your tent fell down 1 73.00000 8.000000
# 3      the ball fell into the red salad 3 32.66667 2.666667
# 4 serious people eat peanuts on Sundays 0      NaN      NaN

关于r - 字符串匹配不同大小的 data.frames,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39677987/

相关文章:

python-3.x - Python 数据框 : converting columns into rows

r - 聚类预测

c - 指针:为什么输出是 6?

.net - string.Empty 在 CLR 中占用多少空间

MySQL: ', `、´ 和 "之间的区别

python - 将文本转换为二进制列

r - c将两个 data.frames 列表绑定(bind)到一个新列表

r - r 中三参数 Weibull 分布的最大似然估计

javascript - 更改数据表中的列过滤器 "All"标签

r - 在 data.table 中自索引时出错