r - 为什么 data.table 的 order 和 setorder(v) 之间存在这种差异?

标签 r data.table

我想订购一个数据表。使用 setorder(v) 我收到与 base::order 不同的结果。我怎样才能收到相同的结果,订单?

我已经尝试用 1 列来求解,但我应该按更多列进行排序。

1: (test = data.table(V1 = c("TeilA1_a", "TeilA_a", "TeilC1_a", "TeilA_a"), 
                      V2 = c("4", "3", "2", "1")))
 2:   test[order(V1)]
 3:   sort(test[[1]])
 4:   c(order(test[[1]]))
 5:   order(test[[1]])
 6:   test[c(order(test[[1]]))]
 7:   test[order(test[[1]])]
 8:   (setorderv(test, c("V1"), c(1)))
 9:   test[order(V1)]
10:  test[order(V1, V2)]

为什么 c(4,5) 行的结果相同,而 c(6,7) 行的结果不同?我期望第 8 行的输出与第 6 行相同。

最佳答案

来自?setorder(data.table_1.12.2):

Also note that data.table always reorders in "C-locale" (see Details). To sort by session locale, use x[base::order(.)].

进一步详细信息

data.table always reorders in "C-locale". As a consequence, the ordering may be different to that obtained by base::order. In English locales, for example, sorting is case-sensitive in C-locale. Thus, sorting c("c", "a", "B") returns c("B", "a", "c") in data.table but c("a", "B", "c") in base::order. Note this makes no difference in most cases of data; both return identical results on ids where only upper-case or lower-case letters are present ("AB123" < "AC234" is true in both), or on country names and other proper nouns which are consistently capitalized. For example, neither "America" < "Brazil" nor "america" < "brazil" are affected since the first letter is consistently capitalized.

感谢 Frank,注意到当订单优化开启时,order 在内部被 data.table:::forder 替换,而 test[(order (V1))];测试[c(顺序(V1))]; test[base::order(V1)] 不会,而是从 [ 之外的范围检索。

也感谢 MichaelChirico,test[c(order(V1))];从 data.table_1.12.4 开始,test[(order(V1))] 将默认为 data.table:::forder。请查看NEWS获取更新。

因此,test[order(V1), verbose=TRUE] 按 C 语言环境排序(就像 test[data.table::chorder(V1)] )给予

#data.table_1.12.2
order optimisation is on, i changed from 'order(...)' to 'forder(DT, ...)'.
   rn       V1 V2
1:  1 TeilA1_a  4
2:  2  TeilA_a  3
3:  4  TeilA_a  1
4:  3 TeilC1_a  2

测试[base::order(V1)];测试[(顺序(V1)),详细= TRUE]; test[c(order(V1)), verbose=TRUE] 给出

   rn       V1 V2
1:  2  TeilA_a  3
2:  4  TeilA_a  1
3:  1 TeilA1_a  4
4:  3 TeilC1_a  2

数据:

library(data.table)
test = data.table(rn=1:4, V1 = c("TeilA1_a", "TeilA_a", "TeilC1_a", "TeilA_a"), V2 = c("4", "3", "2", "1"))

关于r - 为什么 data.table 的 order 和 setorder(v) 之间存在这种差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58181909/

相关文章:

r - 如何从数据表中的列子集中提取唯一行?

r - 如何引用给定行以在基数 100 中创建变量

r - R 中具有不等行的绑定(bind)矩阵

r - 将列值拆分为 R 中的单独列

r - 将strptime函数应用于data.table的每个成员

r - 使用R中的OrderBook数据建模价格时间优先级

reshape 为宽格式,同时一次仅改变一列(用于敏感性分析)

url - 在R中的http请求中更改用户代理字符串

r - ggplot2 更改颜色条标签

r - OpenCPU数据缓存