R - 如果列包含向量中的字符串,则将标志附加到另一列中

标签 r vector dataset tibble grepl

我的数据

我有一个单词向量,如下所示。这过于简单化了,我真正的向量超过 600 个单词:

myvec <- c("cat", "dog, "bird")

我有一个具有以下结构的数据框:

structure(list(id = c(1, 2, 3), onetext= c("cat furry pink british", 
"dog cat fight", "bird cat issues"), cop= c("Little Grey Cat is the nickname given to a kitten of the British Shorthair breed that rose to viral fame on Tumblr through a variety of musical tributes and photoshopped parodies in late September 2014", 
"Dogs have soft fur and tails so do cats Do cats like to chase their tails", 
"A cat and bird can coexist in a home but you will have to take certain measures to ensure that a cat cannot physically get to the bird at any point"
), text3 = c("On October 4th the first single topic blog devoted to the little grey cat was launched On October 20th Tumblr blogger Torridgristle shared a cutout exploitable image of the cat, which accumulated over 21000 notes in just over three months.", 
"there are many fights going on and this is just an example text", 
"Some cats will not care about a pet bird at all while others will make it its life mission to get at a bird You will need to assess the personalities of your pets and always remain on guard if you allow your bird and cat to interact"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

如下图所示

sample dataset

我的问题

对于向量 myvec 上的每个关键字,我需要遍历数据集并检查列 onetextcoptext3,如果我在这 3 列中的任意一列中找到关键字,那么我需要将关键字附加到新的列中柱子。结果如下图:

expected result

我的原始数据集非常大(最后一列是最长的),因此执行多个嵌套循环(这是我尝试过的)并不理想。

编辑:请注意,只要该单词在该行中出现一次,就足够了,应该列出来。应列出所有关键字。

我怎样才能做到这一点?我使用的是 tidyverse,所以我的数据集实际上是一个 tibble

类似的帖子(但不完全)

以下帖子有些相似,但不完全相同:

最佳答案

更新: 如果首选列表:使用 str_extract_all:

df %>%  
  transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) 

给出:

  new_colonetext new_colcop new_coltext3
  <list>         <list>     <list>      
1 <chr [1]>      <NULL>     <chr [2]>   
2 <chr [2]>      <chr [2]>  <NULL>      
3 <chr [2]>      <chr [4]>  <chr [5]>  

以下是实现结果的方法:

  1. 创建矢量图案
  2. 使用mutate across来检查所需的列
  3. 如果检测到所需的字符串,则提取到新列!
myvec <- c("cat", "dog", "bird")

pattern <- paste(myvec, collapse="|")

library(dplyr)
library(tidyr)
df %>% 
  mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>% 
  unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
    id onetext                cop                                                                        text3                                                                              topic                                     
  <dbl> <chr>                  <chr>                                                                      <chr>                                                                              <chr>                                     
1     1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"            
2     2 dog cat fight          Dogs have soft fur and tails so do cats Do cats like to chase their tails  there are many fights going on and this is just an example text                    "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3     3 bird cat issues        A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~                                                                                    

关于R - 如果列包含向量中的字符串,则将标志附加到另一列中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70386370/

相关文章:

r - R 中的向量与数据框

r - 连接列表中的 2 个列表

c++ - 将两个 vector 中的元素按字母顺序排列到一个 vector 中

c# - 创建一组字符串,其中列标题作为字符串标题,字符串值作为第 0 行值

r - 如何从 Google 获取道琼斯指数 (DJI) 数据?

python - 如何在 Python 中找到两个向量具有相等元素的索引集

c++ - 我应该在我的代码中使用 std::vector::at()

permissions - 如何在 Google BigQuery 上设置特定数据集的权限?

VB.NET System.NullReferenceException

r - 仅更改一个面(行)的facet_grid()高度是否可能?