这个问题可能与此有关 question .
不幸的是,那里给出的解决方案不适用于我的数据。
我有以下向量示例:
example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")
我当然想要没有重复的相同字符串,即:
> result
[1] "Children" "Clothing and shoes" "Education, health and beauty"
那可能吗?
最佳答案
您可以使用 sub
为此,直接在 pattern
中捕获您想要的位部分:
sub("(.+)\\1", "\\1", example)
#[1] "Children" "Clothing and shoes" "Education, health and beauty" "Leisure activities, traveling" "Loans"
#[6] "Loans and financial services" "Personal transfers" "Savings and investments" "Transportation" "Utility services"
(.+)
允许捕获一些模式和 \\1
显示您刚刚捕获的内容,因此您要查找的是“任何东西两次”,然后用相同的“任何东西”替换,但只替换一次。
关于去除字符串中的重复字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55124643/