去除字符串中的重复字符

标签 r regex

这个问题可能与此有关 question .

不幸的是,那里给出的解决方案不适用于我的数据。

我有以下向量示例:

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

我当然想要没有重复的相同字符串,即:
  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

那可能吗?

最佳答案

您可以使用 sub为此,直接在 pattern 中捕获您想要的位部分:

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"
(.+)允许捕获一些模式和 \\1显示您刚刚捕获的内容,因此您要查找的是“任何东西两次”,然后用相同的“任何东西”替换,但只替换一次。

关于去除字符串中的重复字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55124643/

相关文章:

在列中第一次出现 0 后删除组的后续行

php preg_match 换行符

javascript - 如何查找当前页面网址(如 "index.html")?

regex - 如何grep一个shell变量来匹配行尾?

r - 选择嵌套列表的第一个元素

r - 有没有办法从R内部关闭计算机

android - 从某个点开始解析字符串

javascript - 正好有 3 个字符和 10 个数字的 13 个字符字母数字字符串的正则表达式

r - 通过将点包装在一个框中来注释 ggplot2

r - 如何对已排序组内的组进行排序?