r - dplyr::ntile 和 statar::xtile 之间的区别

我的理解是 dplyr::ntile 和 statar::xtile 正在尝试做同样的事情。但有时输出是不同的:

dplyr::ntile(1:10, 5)
# [1] 1 1 2 2 3 3 4 4 5 5

statar::xtile(1:10, 5)
# [1] 1 1 2 2 3 3 3 4 5 5

我正在将 Stata 代码转换为 R，所以 statar::xtile 提供与原始 Stata 代码相同的输出，但我认为 dplyr::ntile 是等效的在 R 中。

Stata help说 xtile 用于:

Create variable containing quantile categories

statar::xtile 显然是在复制它。

dplyr::ntile 是:

a rough rank, which breaks the input vector into n buckets.

这两个意思是一样的吗？

如果是这样，为什么他们会给出不同的答案？

如果没有，那么:

最佳答案

感谢@alistaire 指出 dplyr::ntile 只是在做:

function (x, n) { floor((n * (row_number(x) - 1)/length(x)) + 1) }

因此，与 xtile 所做的分成分位数类别不同。

查看 statar::xtile 的代码会导致 statar::pctile 和 documentation对于 statar 表示:

pctile computes quantile and weighted quantile of type 2 (similarly to Stata _pctile)

因此，在 base R 中等同于 statar::xtile 的是:

.bincode(1:10, quantile(1:10, seq(0, 1, length.out = 5 + 1), type = 2), 
         include.lowest = TRUE)
# [1] 1 1 2 2 3 3 3 4 5 5

关于r - dplyr::ntile 和 statar::xtile 之间的区别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42351299/