我有一个 df,其中有一个特殊字符。我想删除它,但不知道如何删除。我已经尝试过[:graph:]
, [:print:]
和<U+00AE>
。但没有任何作用。我应该怎么办?有没有一种方法可以一次性删除数据集中的类似问题,例如 ®
?
df<-structure(list(df = structure(c(1L, 4L, 2L, 3L), .Label = c("Cabozantinib",
"Left nephrectomy", "Left Superficial Inguinal Lymph Node Dissection",
"XmAb<U+00AE>20717 (Duet-2 study - a humanized bispecific monoclonal antibody that binds PD1 and CTLA4)"
), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
最佳答案
如果打算删除这些字符,请使用模式来匹配 <U
接下来是 +
(元字符 - 转义 \\
),后跟一个或多个不是 >
的字符( [^>]+
) 和 >
在str_remove_all
删除所有出现的该模式子字符串
library(stringr)
df$df <- str_remove_all(df$df, "<U\\+[^>]+\\>")
df$df
[1] "Cabozantinib"
[2] "XmAb20717 (Duet-2 study - a humanized bispecific monoclonal antibody that binds PD1 and CTLA4)"
[3] "Left nephrectomy"
[4] "Left Superficial Inguinal Lymph Node Dissection"
如果我们仍然想打印那些 unicode 字符
library(stringi)
stri_unescape_unicode(str_replace_all(df$df, "<U\\+([^>]+)\\>", "\\\\u\\1"))
[1] "Cabozantinib"
[2] "XmAb®20717 (Duet-2 study - a humanized bispecific monoclonal antibody that binds PD1 and CTLA4)"
[3] "Left nephrectomy"
[4] "Left Superficial Inguinal Lymph Node Dissection"
关于r - 如何删除 df 中非常特殊的字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68867261/