r - 为什么 data.table 的 order 和 setorder(v) 之间存在这种差异？

我想订购一个数据表。使用 setorder(v) 我收到与 base::order 不同的结果。我怎样才能收到相同的结果，订单？

我已经尝试用 1 列来求解，但我应该按更多列进行排序。

1: (test = data.table(V1 = c("TeilA1_a", "TeilA_a", "TeilC1_a", "TeilA_a"), 
                      V2 = c("4", "3", "2", "1")))
 2:   test[order(V1)]
 3:   sort(test[[1]])
 4:   c(order(test[[1]]))
 5:   order(test[[1]])
 6:   test[c(order(test[[1]]))]
 7:   test[order(test[[1]])]
 8:   (setorderv(test, c("V1"), c(1)))
 9:   test[order(V1)]
10:  test[order(V1, V2)]

为什么 c(4,5) 行的结果相同，而 c(6,7) 行的结果不同？我期望第 8 行的输出与第 6 行相同。

最佳答案

来自？setorder(data.table_1.12.2):

Also note that data.table always reorders in "C-locale" (see Details). To sort by session locale, use x[base::order(.)].

进一步详细信息

data.table always reorders in "C-locale". As a consequence, the ordering may be different to that obtained by base::order. In English locales, for example, sorting is case-sensitive in C-locale. Thus, sorting c("c", "a", "B") returns c("B", "a", "c") in data.table but c("a", "B", "c") in base::order. Note this makes no difference in most cases of data; both return identical results on ids where only upper-case or lower-case letters are present ("AB123" < "AC234" is true in both), or on country names and other proper nouns which are consistently capitalized. For example, neither "America" < "Brazil" nor "america" < "brazil" are affected since the first letter is consistently capitalized.

感谢 Frank，注意到当订单优化开启时，order 在内部被 data.table:::forder 替换，而 test[(order (V1))];测试[c(顺序(V1))]; test[base::order(V1)] 不会，而是从 [ 之外的范围检索。

也感谢 MichaelChirico，test[c(order(V1))];从 data.table_1.12.4 开始，test[(order(V1))] 将默认为 data.table:::forder。请查看NEWS获取更新。

因此，test[order(V1), verbose=TRUE] 按 C 语言环境排序(就像 test[data.table::chorder(V1)] )给予

#data.table_1.12.2
order optimisation is on, i changed from 'order(...)' to 'forder(DT, ...)'.
   rn       V1 V2
1:  1 TeilA1_a  4
2:  2  TeilA_a  3
3:  4  TeilA_a  1
4:  3 TeilC1_a  2

而测试[base::order(V1)];测试[(顺序(V1))，详细= TRUE]； test[c(order(V1)), verbose=TRUE] 给出

   rn       V1 V2
1:  2  TeilA_a  3
2:  4  TeilA_a  1
3:  1 TeilA1_a  4
4:  3 TeilC1_a  2

数据:

library(data.table)
test = data.table(rn=1:4, V1 = c("TeilA1_a", "TeilA_a", "TeilC1_a", "TeilA_a"), V2 = c("4", "3", "2", "1"))

关于r - 为什么 data.table 的 order 和 setorder(v) 之间存在这种差异？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58181909/

r - 为什么 data.table 的 order 和 setorder(v) 之间存在这种差异？

上一篇：sparql - 通过 sparql-update 查询将 TTL 上传到 GraphDB 在变音符号上失败

下一篇：django - 如果列数太多，是否应该将表除以 OneToOneField？