r - 如何使用 mutate 在 dplyr 中 grep

我需要一些帮助来了解我的 dplyr 中发生了什么管道，我要求解决这个问题的各种解决方案。

问题

我有一个研究所列表(论文作者来自研究期刊文章的正式术语)，我想提取主要研究所名称。如果是大学，那就是大学。 XX 的例子，这就是我为了简单起见在这里坚持的例子。

尝试的解决方案逻辑

用逗号分割学院名称

grep 用于术语“univ”或其他大学相关术语列表

提取命中的索引

边缘情况/假设

我正在搜索的术语仅存在于一个拆分

中

这里的所有机构都是大学(在 Stack Overflow 上保持问题简单)

代码

df %>%
mutate(instGuess = unlist(strsplit(institute, ","))[grep("univ", unlist(strsplit(institute, ",")))][1]) %>%
 head()

什么我假设正在发生但没有发生是我上面写的逻辑。我看到发生的是在 mutate 中，institute 的第一个实例正在搜索 df 中的每一行和完全相同的“新大学~”正在填写。我对错误是什么有一个大致的了解，除了不知道为什么会发生或如何修复它同时保持dplyr .如果我使用 apply功能我可以做到这一点，我很好奇有什么答案。

它看起来像什么:

# A tibble: 6 x 2
  institute                                                                          instGuess              
  <chr>                                                                              <chr>                  
1 school of computer science and engineering, university of new south wales, sydney~ " university of new so~
2 department computer science, friedrich-alexander-university, erlangen-nuremberg, ~ " university of new so~
3 department of ece, pesit, bangalore, india                                         " university of new so~
4 school of information technology and electrical engineering, university of queens~ " university of new so~
5 school of information technology and electrical engineering, university of queens~ " university of new so~
6 dept. of info. syst. and comp. sci., national university of singapore, 10 kent ri~ " university of new so~

用于示例的数据

df <- structure(list(institute = c("school of computer science and engineering, university of new south wales, sydney, australia", 
"department computer science, friedrich-alexander-university, erlangen-nuremberg, germany", 
"department of ece, pesit, bangalore, india", "school of information technology and electrical engineering, university of queenslandqld, australia", 
"school of information technology and electrical engineering, university of queenslandold, australia", 
"dept. of info. syst. and comp. sci., national university of singapore, 10 kent ridge crescent, singapore 119260, singapore"
), instGuess = c(" university of new south wales", " university of new south wales", 
" university of new south wales", " university of new south wales", 
" university of new south wales", " university of new south wales"
)), .Names = c("institute", "instGuess"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

最佳答案

您需要包含一个 group_by使您的语法正常工作:

df %>%
  group_by(institute) %>%
  mutate(instGuess = unlist(strsplit(institute, ","))[grep("univ", unlist(strsplit(institute, ",")))][1])

产生:

# A tibble: 6 x 2
# Groups:   institute [6]
institute                                                                  instGuess              
<chr>                                                                      <chr>                  
  1 school of computer science and engineering, university of new south wales… " university of new so…
2 department computer science, friedrich-alexander-university, erlangen-nur… " friedrich-alexander-…
3 department of ece, pesit, bangalore, india                                 NA                     
4 school of information technology and electrical engineering, university o… " university of queens…
5 school of information technology and electrical engineering, university o… " university of queens…
6 dept. of info. syst. and comp. sci., national university of singapore, 10… " national university …

关于r - 如何使用 mutate 在 dplyr 中 grep，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50110001/

r - 如何使用 mutate 在 dplyr 中 grep

上一篇：java-8 - Java 8 中是否仍然存在方法区域？

下一篇：r - 如何使用 R 中的过渡日期数据创建面板数据集？