我正在尝试使用 stringdist 来识别同一向量中最大距离为 1 的所有字符串,然后发布匹配项。这是数据示例:
起始数据框:
a = c("tom", "tomm", "alex", "alexi", "chris", "jen", "jenn", "michell")
b = c(NA)
df = data.frame(a,b)
期望的结果:
a = c("tom", "tomm", "alex", "alexi", "chris", "jen", "jenn", "michell")
b = c("tomm", "tom", "alexi", "alex", 0, "jenn", "jen", 0)
df = data.frame(a,b)
我可以将 stringdist 用于两个向量,但在将它用于一个向量时遇到问题。感谢您的帮助,R 社区。p>
最佳答案
这是一种可能的方法:
a = c("tom", "tomm", "alex", "alexi", "chris", "jen", "jenn", "michell")
min_dist <- function(x, method = "cosine", tol = .5){
y <- vector(mode = "character", length = length(x))
for(i in seq_along(x)){
dis <- stringdist(x[i], x[-i], method)
if (min(dis) > tol) {
y[i] <- "0"
} else {
y[i] <- x[-i][which.min(dis)]
}
}
y
}
min_dist(a, 'cosine', .4)
## [1] "tomm" "tom" "alexi" "alex" "0" "jenn" "jen" "0"
关于r - 一个向量上的 stringdist,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41560251/