我需要一些帮助来了解我的 dplyr
中发生了什么管道,我要求解决这个问题的各种解决方案。
问题
我有一个研究所列表(论文作者来自研究期刊文章的正式术语),我想提取主要研究所名称。如果是大学,那就是大学。 XX 的例子,这就是我为了简单起见在这里坚持的例子。
尝试的解决方案逻辑
边缘情况/假设
代码
df %>%
mutate(instGuess = unlist(strsplit(institute, ","))[grep("univ", unlist(strsplit(institute, ",")))][1]) %>%
head()
什么我假设 正在发生但没有发生是我上面写的逻辑。我看到发生的是在 mutate 中,
institute
的第一个实例正在搜索 df
中的每一行和完全相同的“新大学~”正在填写。我对错误是什么有一个大致的了解,除了不知道为什么会发生或如何修复它同时保持dplyr
.如果我使用 apply
功能我可以做到这一点,我很好奇有什么答案。它看起来像什么:
# A tibble: 6 x 2
institute instGuess
<chr> <chr>
1 school of computer science and engineering, university of new south wales, sydney~ " university of new so~
2 department computer science, friedrich-alexander-university, erlangen-nuremberg, ~ " university of new so~
3 department of ece, pesit, bangalore, india " university of new so~
4 school of information technology and electrical engineering, university of queens~ " university of new so~
5 school of information technology and electrical engineering, university of queens~ " university of new so~
6 dept. of info. syst. and comp. sci., national university of singapore, 10 kent ri~ " university of new so~
用于示例的数据
df <- structure(list(institute = c("school of computer science and engineering, university of new south wales, sydney, australia",
"department computer science, friedrich-alexander-university, erlangen-nuremberg, germany",
"department of ece, pesit, bangalore, india", "school of information technology and electrical engineering, university of queenslandqld, australia",
"school of information technology and electrical engineering, university of queenslandold, australia",
"dept. of info. syst. and comp. sci., national university of singapore, 10 kent ridge crescent, singapore 119260, singapore"
), instGuess = c(" university of new south wales", " university of new south wales",
" university of new south wales", " university of new south wales",
" university of new south wales", " university of new south wales"
)), .Names = c("institute", "instGuess"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))
最佳答案
您需要包含一个 group_by
使您的语法正常工作:
df %>%
group_by(institute) %>%
mutate(instGuess = unlist(strsplit(institute, ","))[grep("univ", unlist(strsplit(institute, ",")))][1])
产生:
# A tibble: 6 x 2
# Groups: institute [6]
institute instGuess
<chr> <chr>
1 school of computer science and engineering, university of new south wales… " university of new so…
2 department computer science, friedrich-alexander-university, erlangen-nur… " friedrich-alexander-…
3 department of ece, pesit, bangalore, india NA
4 school of information technology and electrical engineering, university o… " university of queens…
5 school of information technology and electrical engineering, university o… " university of queens…
6 dept. of info. syst. and comp. sci., national university of singapore, 10… " national university …
关于r - 如何使用 mutate 在 dplyr 中 grep,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50110001/