R - 如何从多个匹配项中替换字符串(在数据框中)

标签 r string replace gsub

我需要用存储在数据帧中的一些匹配项替换字符串的子集。

例如 -

input_string = "Whats your name and Where're you from"

我需要从数据框中替换此字符串的一部分。说数据框是
matching <- data.frame(from_word=c("Whats your name", "name", "fro"),
            to_word=c("what is your name","names","froth"))

预期输出为 你叫什么名字,你来自哪里

笔记 -
  • 它是匹配最大的字符串。在本例中,姓名 不匹配姓名 ,因为 name 是更大匹配的一部分
  • 它必须匹配整个字符串而不是部分字符串。 “from” 的 fro 不应与“froth”匹配

  • 我引用了下面的链接,但不知何故无法按预期/描述的方式完成这项工作

    Match and replace multiple strings in a vector of text without looping in R

    这是我在这里的第一篇文章。如果我没有提供足够的细节,请让我知道

    最佳答案

    编辑

    根据 Sri 评论的输入,我建议使用:

    library(gsubfn)
    # words to be replaced
    a <-c("Whats your","Whats your name", "name", "fro")
    # their replacements
    b <- c("What is yours","what is your name","names","froth")
    # named list as an input for gsubfn
    replacements <- setNames(as.list(b), a)
    # the test string
    input_string = "fro Whats your name and Where're name you from to and fro I Whats your"
    # match entire words
    gsubfn(paste(paste0("\\w*", names(replacements), "\\w*"), collapse = "|"), replacements, input_string)
    

    原来的

    我不会说这比您的简单循环更容易阅读,但它可能会更好地处理重叠替换:
    # define the sample dataset
    input_string = "Whats your name and Where're you from"
    matching <- data.frame(from_word=c("Whats your name", "name", "fro", "Where're", "Whats"),
                           to_word=c("what is your name","names","froth", "where are", "Whatsup"))
    
    # load used library
    library(gsubfn)
    
    # make sure data is of class character
    matching$from_word <- as.character(matching$from_word)
    matching$to_word <- as.character(matching$to_word)
    
    # extract the words in the sentence
    test <- unlist(str_split(input_string, " "))
    # find where individual words from sentence match with the list of replaceble words
    test2 <- sapply(paste0("\\b", test, "\\b"), grepl, matching$from_word)
    # change rownames to see what is the format of output from the above sapply
    rownames(test2) <- matching$from_word
    # reorder the data so that largest replacement blocks are at the top
    test3 <- test2[order(rowSums(test2), decreasing = TRUE),]
    # where the word is already being replaced by larger chunk, do not replace again
    test3[apply(test3, 2, cumsum) > 1] <- FALSE
    
    # define the actual pairs of replacement
    replacements <- setNames(as.list(as.character(matching[,2])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1]),
                             as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1])
    
    # perform the replacement
    gsubfn(paste(as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1], collapse = "|"),
           replacements,input_string)
    

    关于R - 如何从多个匹配项中替换字符串(在数据框中),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42998632/

    相关文章:

    r - 对变量进行分组并将它们添加到表的末尾

    r - 如何比较 R 中的 boolean 向量

    java - 如何将字符串格式化为一行,StringUtils?

    带字符串操作的 Ruby gsub

    mysql - 表达式中的 REPLACE 和 IF 会导致上一行的串联

    php - UPDATE 字段中包含逗号(,)的 MySQL 表

    r - 禁止发出任何特定警告消息

    R:错误:在 dplyr 中使用 unnest 时长度不兼容

    C++ - str(n)cmp(i) () 用于 C++ 样式字符串

    php将以小写字母结尾的单词末尾的逗号替换为冒号