r - cut 函数产生不均匀的初断

我正在探索 cut 函数的使用，并尝试将以下基本向量切割成 10 个断点。我可以做到，但我很困惑为什么我的初始中断发生在 -0.1 而不是 0:

test_vec <- 0:10
test_vec2 <- cut(test_vec, breaks = 10)
test_vec2

产量:

(-0.01,1] (-0.01,1] (1,2]     (2,3]     (3,4]     (4,5]     (5,6]     (6,7]     (7,8]     (8,9]    (9,10]

为什么这会产生 2 个 (-0.01,1] (-0.01,1] 的实例，而较低的数字不是从 0 开始的？

最佳答案

tl;dr 要获得您可能想要的内容，您可能需要明确指定中断，并且 include.lowest=TRUE:

cut(x,breaks=0:10,include.lowest=TRUE)

问题大概是这样的，来自?cut的“Details”:

When ‘breaks’ is specified as a single number, the range of the data is divided into ‘breaks’ pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.

由于范围是 (0,10)，所以外部限制是 (-0.01, 10.01)；正如@Onyambu 所暗示的那样，结果是不对称的，因为 0 处的值位于左侧边界(不包括)，而 10 处的值位于右侧边界(包括)。

(明显的)不对称是由于格式化造成的；如果你按照下面的代码(base:::cut.default() 的核心，你会看到顶部突破实际上是在 10.01，但被格式化为“10”，因为默认位数是 3 ...

x <- 0:10
breaks <- 10
dig <- 3
nb <- as.integer(breaks+1)
dx <- diff(rx <- range(x, na.rm = TRUE))
breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] +  dx/1000)
ch.br <- formatC(0 + breaks, digits = dig, width = 1L)

关于r - cut 函数产生不均匀的初断，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60160015/

r - cut 函数产生不均匀的初断

上一篇：删除重复项，保留最频繁的行

下一篇：r - 在主标题上方放置图例