r - data.table R 中的滞后列表

<分区>

R 的 data.table 中的

shift 非常适合时间序列和时间窗口内容。但是列表的列不会像其他元素的列那样滞后。在下面的代码中，gearLag 正确地超前/滞后于 gear，但是 gearsListLag 并不落后于 gearsList，相反， shift 在 gearsList 中运行，使同一行中的元素滞后于自身。

dt <- data.table(mtcars)[,.(gear, carb, cyl)]
###  Make col of lists
dt[,carbList:=list(list(unique(carb))), by=.(cyl, gear)]
###  Now I want to lag/lead col of lists
dt[,.(carb, carbLag=shift(carb)
    , carbList, carbListLag=shift(carbList, type="lead")), by=cyl] 

    cyl carb carbLag carbList carbListLag
 1:   6    4      NA         4           NA
 2:   6    4       4         4           NA
 3:   6    1       4         1           NA <-- should be 4 here, not NA
 4:   6    1       1         1           NA
 5:   6    4       1         4           NA
 6:   6    4       4         4           NA
 7:   6    6       4         6           NA
 8:   4    1      NA       1,2         2,NA
 9:   4    2       1       1,2         2,NA
10:   4    2       2       1,2         2,NA
11:   4    1       2       1,2         2,NA
12:   4    2       1       1,2         2,NA
13:   4    1       2       1,2         2,NA
14:   4    1       1         1           NA <-- should be (1,2) here, not NA
15:   4    1       1       1,2         2,NA
16:   4    2       1         2           NA
17:   4    2       2         2           NA
18:   4    2       2       1,2         2,NA
19:   8    2      NA     2,4,3      4, 3,NA
20:   8    4       2     2,4,3      4, 3,NA
21:   8    3       4     2,4,3      4, 3,NA
22:   8    3       3     2,4,3      4, 3,NA
23:   8    3       3     2,4,3      4, 3,NA

有什么建议可以像我滞后于其他元素一样滞后于列表吗？

最佳答案

这是记录在案的行为。以下是 ?shift 示例的一部分:

# on lists
ll = list(1:3, letters[4:1], runif(2))
shift(ll, 1, type="lead")

# [[1]]
# [1]  2  3 NA
# 
# [[2]]
# [1] "c" "b" "a" NA 
# 
# [[3]]
# [1] 0.1190792        NA

要解决这个问题，您可以为列表的每个值创建一个唯一的 ID:

dt[, carbList_id := match(carbList, unique(carbList))]

carbList_map = dt[, .(carbList = list(carbList[[1]])), by=carbList_id]

#    carbList_id carbList
# 1:           1        4
# 2:           2      1,2
# 3:           3        1
# 4:           4    2,4,3
# 5:           5        2
# 6:           6      4,8
# 7:           7        6

# or stick with long-form:
carbList_map = dt[, .(carb = carbList[[1]]), by=carbList_id]

#     carbList_id carb
#  1:           1    4
#  2:           2    1
#  3:           2    2
#  4:           3    1
#  5:           4    2
#  6:           4    4
#  7:           4    3
#  8:           5    2
#  9:           6    4
# 10:           6    8
# 11:           7    6

然后，只需shift 或任何具有新ID 列的内容。当您再次需要 carbList 的值时，您必须与新表合并。

或者，如果您真的不需要使用这些值，而只是为了浏览它们，请考虑将其改为字符串，例如 carbList:=toString(sort(unique(carb))) 或使用 paste0。

旁注:在使用 toString、paste0 或 list 之前进行排序。

关于r - data.table R 中的滞后列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36040542/

上一篇：SQL - 限制存储过程输入参数

下一篇：angular - angularJS 2中，如何优化.js库加载，让应用启动快？

相关文章：

r - 使用 gc() 命令强制在 R 中运行垃圾回收

具有静态分配节点的 C++ std::list

java - Java 中的可滚动列表

python - 从嵌套列表创建字典的有效方法

R data.table 将函数应用于两列的总和

r - 在 data.table 的整个列上应用自定义函数？

R:转换为与 case_when 相同的级别顺序的因子

r - 清除 R session 分配的内存(gc() 没有帮助!)

r - 使用 openxlsx 导入多个 Excel 工作表

r - data.table 的 'j' 新添加的列应该在范围内可用