r - 计算与其他列的双重类别关联的列中的特定字符。根据频率仓迭代进行

我有一个巨大的数据框 df1，其过于简化的版本由 3 列组成:“单词”、“频率”和“字母”:

Words           Frequency   Letters
flower/tree     0.15        a(0.1)
tree            0.67        a(0.4)
planet          0.85        b(0.4)
tree/planet     0.42        c(0.5)
tree            0.89        a(0.6)
flower          0.21        b(0.4)
flower/planet   0.53        b
planet          0.07        a

使用 R(dplyr、应用族函数等)我想计算“字母”列中的每个字母(a、b、c)与“单词”中的每个单词关联的次数列(花、树、行星)，以迭代方式依赖于“频率”列值的频率仓。有 4 个 bin:[0, 0.25]、[0.25, 0.5]、[0.5, 0.75]、[0.75, 1]。

我期望输出数据帧 df2 看起来像这样:

Bin       Word    Letters    count_letters
0-0.25    flower  a          1
0-0.25    flower  b          1
0-0.25    tree    a          1
0-0.25    planet  a          1
0.25-0.5  tree    c          1
0.25-0.5  planet  c          1
0.5-0.75  flower  b          1
0.5-0.75  tree    a          1
0.5-0.75  planet  b          1
0.75-1    tree    a          1
0.75-1    planet  b          1

最佳答案

您可以使用 cut 来存储 Frequency、substr 来清理 Letters 和 tidyr: :separate_rows 取消嵌套 Word。与 dplyr::count 聚合，就可以了:

library(tidyverse)

df %>% separate_rows(Words) %>% 
    count(Words, 
          Letters = substr(Letters, 1, 1),    # use regex if more than one letter
          Frequency = cut(Frequency, breaks = seq(0, 1, .25)))

## Source: local data frame [11 x 4]
## Groups: Frequency, Words [?]
## 
##     Frequency  Words Letters     n
##        <fctr>  <chr>   <chr> <int>
## 1    (0,0.25] flower       a     1
## 2    (0,0.25] flower       b     1
## 3    (0,0.25] planet       a     1
## 4    (0,0.25]   tree       a     1
## 5  (0.25,0.5] planet       c     1
## 6  (0.25,0.5]   tree       c     1
## 7  (0.5,0.75] flower       b     1
## 8  (0.5,0.75] planet       b     1
## 9  (0.5,0.75]   tree       a     1
## 10   (0.75,1] planet       b     1
## 11   (0.75,1]   tree       a     1

关于r - 计算与其他列的双重类别关联的列中的特定字符。根据频率仓迭代进行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42237800/

r - 计算与其他列的双重类别关联的列中的特定字符。根据频率仓迭代进行

上一篇：mobx - 如何安装 mobx 3

下一篇：angular2 包括外部 html 模板