r - 将多个对象强制转换为具有不同开始/结束日期的时间序列对象

标签 r time-series tidyverse purrr tidyquant

我正在按照本教程使用扫描包对时间序列组执行整洁的时间序列预测。 Sweep 将 broom 包扩展为整洁的预测对象。

教程在这里: https://rdrr.io/cran/sweep/f/vignettes/SW01_Forecasting_Time_Series_Groups.Rmd

问题:我的数据中的时间序列包含不同的长度和开始日期。在本教程中,固定的开始被传递给 tk_ts(),因为每个时间序列具有相同的开始和结束日期:

monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
mutate(data.ts = map(.x       = data.tbl, 
                     .f       = tk_ts, 
                     select   = -order.month, 
                     start    = 2011, # <- see the fixed start date here
                     freq     = 12))

问题:如何使用 map 创建时间序列对象的列表列,如上面的示例(以及教程中),但包含每个序列的正确开始日期和结束日期(即每个系列都不同)

套餐:

library(tidyquant)
library(sweep)
library(timetk)
library(forecast)
library(tidyverse)

可重复的示例数据:

df <- structure(list(id = c("series_1", "series_1", "series_1", "series_1", 
"series_1", "series_1", "series_1", "series_1", "series_1", "series_1", 
"series_1", "series_1", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3"), date = structure(c(10957, 10988, 11017, 
11048, 11078, 11109, 11139, 11170, 11201, 11231, 11262, 11292, 
13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 
14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 
14335, 14365, 14396, 14426, 14457, 14488, 15706, 15737, 15765, 
15796, 15826, 15857, 15887, 15918, 15949, 15979, 16010, 16040, 
16071, 16102, 16130, 16161, 16191, 16222, 16252, 16283, 16314, 
16344, 16375, 16405, 16436, 16467, 16495, 16526, 16556, 16587, 
16617, 16648, 16679, 16709, 16740, 16770), class = "Date"), value = c(0.526816892903298, 
0.0640646643005311, 0.569032567087561, 0.733993547270074, 0.742038151714951, 
0.273655793862417, 0.167404572479427, 0.766059899237007, 0.60176682821475, 
0.0769246644340456, 0.162491872673854, 0.323168716160581, 0.179594057612121, 
1.096650313586, 0.894524970557541, 1.55353087605909, 1.50662920810282, 
1.06641945429146, 1.95049989689142, 0.226111006457359, 0.644822218455374, 
0.998987099621445, 0.303691457025707, 0.782052680384368, 1.59218573896214, 
0.171859007328749, 1.9222901831381, 1.4127164632082, 0.919900813139975, 
1.93520273640752, 0.00968976970762014, 0.204170028213412, 1.90123205445707, 
1.05964627675712, 1.40747981145978, 0.476186634972692, 1.56826665904373, 
0.106335987104103, 2.7993093256373, 1.07078968570568, 0.668198951287195, 
0.584522894583642, 0.753677956061438, 2.76492932089604, 2.17496411106549, 
2.56561762047932, 0.586419345578179, 1.7261581714265, 1.38705582660623, 
0.708714888431132, 1.91359720285982, 1.85413848585449, 1.85429209470749, 
2.18856360157952, 1.00432092184201, 0.588805445702747, 2.95583719946444, 
0.382465981179848, 0.711439447710291, 1.24924974096939, 0.961857272777706, 
2.26519317110069, 1.10985011514276, 0.938654307508841, 0.985875837039202, 
1.13028976111673, 2.90536748478189, 0.795255574397743, 1.4741945641581, 
2.02167924796231, 1.2093570465222, 1.47486943169497)), .Names = c("id", 
"date", "value"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-72L))

嵌套后:

df_nest <- df %>% group_by(id) %>% 
  nest(.key = data.tbl)

从这里开始,我想应用一些函数来改变一个新的列表列,该列包含来自 data.tbl 的相同数据,如上面的示例(和教程中)强制转换为 ts 对象(以便与预测包),但每个系列都有正确的开始和结束日期。

我想应用这样的东西:

df_ts <- df_nest %>%
  mutate(data.ts = map(.x = data.tbl,
                       .f = tk_ts,
                       select = -date,
                       start = c(2000, 1), # <- Problem HERE
                       freq = 12))

问题是,这仅给出了Series_1的正确开始日期。

如何使用每个系列的正确开始和结束日期来改变这个新的 ts 对象列表列?

谢谢

最佳答案

使用format()提取年份和月份作为start:

df_ts_2 <- df_nest %>%
  mutate(data.ts = map(.x = data.tbl,
                       .f = function(data) tk_ts(
                         data, 
                         select = -date, 
                         start = as.integer(c(format(data$date[1], "%Y"), format(data$date[1], "%m"))),
                         freq = 12
                       )))

print(df_ts_2$data.ts)

# [[1]]
#             Jan        Feb        Mar        Apr        May        Jun        Jul        Aug        Sep        Oct        Nov        Dec
# 2000 0.52681689 0.06406466 0.56903257 0.73399355 0.74203815 0.27365579 0.16740457 0.76605990 0.60176683 0.07692466 0.16249187 0.32316872
# 
# [[2]]
#             Jan        Feb        Mar        Apr        May        Jun        Jul        Aug        Sep        Oct        Nov        Dec
# 2007                                                                                                    0.17959406 1.09665031 0.89452497
# 2008 1.55353088 1.50662921 1.06641945 1.95049990 0.22611101 0.64482222 0.99898710 0.30369146 0.78205268 1.59218574 0.17185901 1.92229018
# 2009 1.41271646 0.91990081 1.93520274 0.00968977 0.20417003 1.90123205 1.05964628 1.40747981 0.47618663                                 
# 
# [[3]]
#            Jan       Feb       Mar       Apr       May       Jun       Jul       Aug       Sep       Oct       Nov       Dec
# 2013 1.5682667 0.1063360 2.7993093 1.0707897 0.6681990 0.5845229 0.7536780 2.7649293 2.1749641 2.5656176 0.5864193 1.7261582
# 2014 1.3870558 0.7087149 1.9135972 1.8541385 1.8542921 2.1885636 1.0043209 0.5888054 2.9558372 0.3824660 0.7114394 1.2492497
# 2015 0.9618573 2.2651932 1.1098501 0.9386543 0.9858758 1.1302898 2.9053675 0.7952556 1.4741946 2.0216792 1.2093570 1.4748694

关于r - 将多个对象强制转换为具有不同开始/结束日期的时间序列对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45620553/

相关文章:

r - 如何生成对称随机矩阵?

r - 连接整个数据框的列对

r - 使 R 代码更简洁,以创建重叠日期的指示器

r - 在 R 中使用线性插值添加缺失的 xts/zoo 数据

r - 在 R 中执行行之间的计算

java - R 包 XLConnect 和 choose.files() 之间的奇怪交互

r - "object not found"当.SD 和.by 中的表达式一起使用时

用 R 中的列值替换整个数据框中的值

hadoop - 使用 Hadoop 存储和处理时间序列

r - 为时间序列数据中的不同日期添加多个vlan