stringr 包中 str_split
的文档指出,对于模式参数:
If "" splits into individual characters.
这表明它在这方面的行为与 strsplit
相同。然而,
library(stringr)
str_split("abcab","")
[[1]]
[1] "" "a" "b" "c" "a" "b"
以空字符串开头。相比之下,
strsplit("abcab","")
[[1]]
[1] "a" "b" "c" "a" "b"
在非空字符串上拆分时,前导空字符串似乎是正常行为,
strsplit("abcab","ab")
[[1]]
[1] "" "c"
但即便如此,str_split
也会生成一个“额外”尾随空字符串:
str_split("abcab","ab")
[[1]]
[1] "" "c" ""
这种差异是文档中的错误、功能、错误还是只是“预期行为”的不同概念?
最佳答案
如果您使用逗号作为分隔符,“预期”(您的里程可能会有所不同)结果会更加明显:
# expect "" "2" "3" "4" ""
strsplit(",2,3,4,", ",")
# [[1]]
# [1] "" "2" "3" "4"
str_split(",2,3,4,", ",")
# [[1]]
# [1] "" "2" "3" "4" ""
如果我有 n
个逗号,那么我期望返回 (n+1)
元素。所以我更喜欢 str_split
的结果。但是,我不一定将其称为 strsplit
中的错误,因为它的执行效果如广告所示:
(from ?strplit) Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is ‘""’, but if there is a match at the end of the string, the output is the same as with the match removed.
""
比较棘手,因为无法计算 ""
在字符串中出现的次数。因此将其视为特殊情况似乎是合理的。
(from ?str_split) If ‘""’ splits into individual characters.
基于此,我建议您发现了一个错误,应该采纳哈德利的建议并报告它!
关于string - str_split 和 strsplit 之间的行为不一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7367284/