我正在 R 中工作,并使用 textclean 包中的 replace_emoticon
函数将表情符号替换为相应的单词:
library(textclean)
test_text <- "i had a great experience xp :P"
replace_emoticon(test_text)
[1] "i had a great e tongue sticking out erience tongue sticking out tongue sticking out "
如上所示,该函数可以工作,但它也会替换看起来像表情符号但在单词内的字符(例如“experience”中的“xp”)。我尝试找到此问题的解决方案,并发现以下函数覆盖声称可以解决此问题:
replace_emoticon <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){
trimws(gsub(
"\\s+",
" ",
mgsub_regex(x, paste0('\\b\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
))
}
replace_emoticon(test_text)
[1] "i had a great experience tongue sticking out :P"
然而,虽然它确实解决了“体验”一词的问题,但它产生了一个全新的问题:它停止替换“:P”——这是一个表情符号,通常应该被函数替换。
此外,字符“xp”会出现错误,但我不确定除了“xp”之外是否还有其他字符在作为单词的一部分时也会被错误地替换。
是否有解决方案告诉 replace_emoticon
函数仅在“表情符号”不是单词的一部分时替换它们?
谢谢!
最佳答案
Wiktor 是对的,边界检查这个词引起了问题。我在下面的函数中稍微调整了一下。这仍然存在一个问题,即表情符号后面是否紧跟着一个单词,而表情符号和单词之间没有空格。问题是最后一个问题是否重要。请参阅下面的示例。
注意:我使用 textclean 将此信息添加到问题跟踪器中。
replace_emoticon2 <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){
trimws(gsub(
"\\s+",
" ",
mgsub_regex(x, paste0('\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
))
}
# works
replace_emoticon2("i had a great experience xp :P")
[1] "i had a great experience tongue sticking out tongue sticking out"
replace_emoticon2("i had a great experiencexp:P:P")
[1] "i had a great experience tongue sticking out tongue sticking out tongue sticking out"
# does not work:
replace_emoticon2("i had a great experience xp :Pnewword")
[1] "i had a great experience tongue sticking out :Pnewword"
添加新功能:
基于 stringi 和来自 wiktor 的正则表达式转义函数 this post
replace_emoticon_new <- function (x, emoticon_dt = lexicon::hash_emoticons, ...)
{
regex_escape <- function(string) {
gsub("([][{}()+*^${|\\\\?.])", "\\\\\\1", string)
}
stringi::stri_replace_all(x,
regex = paste0("\\s+", regex_escape(emoticon_dt[["x"]])),
replacement = paste0(" ", emoticon_dt[['y']]),
vectorize_all = FALSE)
}
test_text <- "Hello :) Great experience! xp :) :P"
replace_emoticon_new(test_text)
[1] "Hello smiley Great experience! tongue sticking out smiley tongue sticking out"
关于Replace_emoticon 函数错误地替换单词中的字符 - R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62270337/