R - 将时间序列与不同频率对齐

标签 r csv time-series

我有两个 .csv 文件,其中包含底部给出的两个单独的时间序列。我可以将这些作为数据框导入 R:

data1 <- read.csv(data1.csv)
data2 <- read.csv(data2.csv)

我有 date , time , 和 price每个数据帧中的信息。 我想调整 data1 的价格和 data2再加上单表表中的常见频率为 10 秒。

我有两个时间序列的开始和结束日期和时间,但是频率(因此每天的观察次数)不同,每天的开始和结束时间也不同

我厌倦了使用 ts()但我不认为这个功能可以同时使用日期和时间。

将这些时间序列与共同频率对齐的最有效方法是什么?

数据1.csv:
date,time,price
01/06/2014,05:59:42,1954.75
01/06/2014,06:00:05,1954.875
01/06/2014,06:00:06,1954.75
01/06/2014,06:00:08,1954.875
01/06/2014,06:02:05,1954.625
01/06/2014,06:02:22,1954.875
01/06/2014,06:03:12,1954.75
01/06/2014,06:03:14,1954.625
01/06/2014,06:03:20,1954.75
01/06/2014,06:03:22,1954.875
01/06/2014,06:03:23,1954.75
01/06/2014,06:03:26,1954.875
01/06/2014,06:07:07,1955.125
01/06/2014,06:07:21,1954.875
01/06/2014,06:08:54,1954.625
01/06/2014,06:16:55,1954.375
01/06/2014,06:17:00,1954.625
01/06/2014,06:21:46,1954.875
01/06/2014,06:28:11,1955.125
01/06/2014,06:30:23,1955.375
01/06/2014,06:30:49,1955.125
01/06/2014,06:33:33,1955.375
01/06/2014,06:34:30,1955.125
01/06/2014,06:37:39,1955.375
01/06/2014,06:37:43,1955.125
01/06/2014,06:47:42,1954.875
01/06/2014,06:50:23,1955.125
01/06/2014,06:57:10,1954.875
01/06/2014,06:57:12,1955.125
01/06/2014,07:00:08,1954.875
01/06/2014,07:00:21,1955.125
01/06/2014,07:00:55,1955.375
01/06/2014,07:01:19,1955.125
01/06/2014,07:01:51,1955.375
02/06/2014,05:59:50,1966.625
02/06/2014,06:00:00,1966.375
02/06/2014,06:00:07,1966.5
02/06/2014,06:00:08,1966.625
02/06/2014,06:00:10,1966.375
02/06/2014,06:00:33,1966.125
02/06/2014,06:00:34,1966.375
02/06/2014,06:00:41,1966.125
02/06/2014,06:00:48,1966.375
02/06/2014,06:02:48,1966.625
02/06/2014,06:03:24,1966.875
02/06/2014,06:04:23,1967.125
02/06/2014,06:04:39,1966.875
02/06/2014,06:05:28,1966.625
02/06/2014,06:06:25,1966.375
02/06/2014,06:07:44,1966.625

数据2.csv:
date,time,price
01/06/2014,02:05:25,0
01/06/2014,06:00:07,3231.5
01/06/2014,06:00:17,3232.5
01/06/2014,06:00:19,3231.5
01/06/2014,06:00:33,3232.5
01/06/2014,06:00:40,3231.5
01/06/2014,06:00:41,3232.5
01/06/2014,06:00:42,3231.5
01/06/2014,06:00:44,3232.5
01/06/2014,06:04:06,3233.5
01/06/2014,06:04:22,3232.5
01/06/2014,06:04:42,3233.5
01/06/2014,06:08:48,3232.5
01/06/2014,06:10:12,3231.5
01/06/2014,06:10:35,3232.5
01/06/2014,06:21:45,3233.5
01/06/2014,06:21:55,3234.5
01/06/2014,06:29:00,3235.5
01/06/2014,06:33:34,3236.5
01/06/2014,06:34:30,3235.5
01/06/2014,06:41:33,3234.5
01/06/2014,06:47:42,3233.5
01/06/2014,06:48:33,3234.5
01/06/2014,06:50:23,3235.5
01/06/2014,06:52:04,3236.5
01/06/2014,06:57:11,3235.5
01/06/2014,07:00:00,3236.5
01/06/2014,07:00:06,3235.5
01/06/2014,07:00:08,3233.5
01/06/2014,07:00:09,3234.5
01/06/2014,07:00:10,3233.5
01/06/2014,07:00:11,3234.5
01/06/2014,07:00:21,3235.5
02/06/2014,06:00:10,3252.5
02/06/2014,06:00:20,3252
02/06/2014,06:00:21,3251.5
02/06/2014,06:00:33,3250.5
02/06/2014,06:00:34,3251
02/06/2014,06:00:35,3250.5
02/06/2014,06:00:41,3249.5
02/06/2014,06:01:31,3250.5
02/06/2014,06:01:32,3249.5
02/06/2014,06:01:38,3250.5
02/06/2014,06:02:47,3251.5
02/06/2014,06:05:32,3250.5
02/06/2014,06:06:25,3249.5
02/06/2014,06:07:44,3250.5
02/06/2014,06:08:11,3249.5
02/06/2014,06:12:32,3250.5
02/06/2014,06:16:56,3251.5
02/06/2014,06:17:08,3250.5
02/06/2014,06:18:32,3251.5
02/06/2014,06:31:59,3250.5
02/06/2014,06:32:11,3251.5
02/06/2014,06:44:47,3250.5
02/06/2014,06:45:09,3251.5
02/06/2014,06:52:33,3252.5
02/06/2014,06:52:36,3253.5
02/06/2014,06:55:30,3254.5
02/06/2014,06:55:39,3253.5
02/06/2014,06:57:27,3254.5
02/06/2014,07:00:01,3253.5
02/06/2014,07:00:02,3254.5
02/06/2014,07:00:17,3253.5
02/06/2014,07:00:23,3252.5

这就是数据框“data1”的样子:
    date        time                Price
1   2014-06-01  06:03:59.614000     62.1250
2   2014-06-01  06:03:59.692000     62.2500
3   2014-06-01  06:15:42.004000     62.2375
4   2014-06-01  06:15:42.083000     61.9250
5   2014-06-01  06:17:01.654000     61.9125
6   2014-06-01  06:17:01.732000     61.9000
7   2014-06-01  06:23:41.908000     61.8200
8   2014-06-01  06:23:41.986000     61.8570
9   2014-06-01  06:23:55.211000     61.9065
10  2014-06-01  06:23:55.291000     61.8725
11  2014-06-01  06:24:11.679000     61.8715

最佳答案

示例数据集

date_time <- seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), as.POSIXlt("2014-01-07 07:00:00"), by = "1 secs")
date_time_1 <- sample(date_time, 100)
date_time_2 <- sample(date_time, 100)

data1 <- data.frame(date=as.Date(date_time_2),
           time = format(date_time_1, "%H:%M:%S"),
           price = rnorm(100)
)
# format the date and time
data1$datetime <- strptime(paste(data1$date, data1$time), "%Y-%m-%d %H:%M:%S")

data2 <- data.frame(date=as.Date(date_time_2),
                    time = format(date_time_1, "%H:%M:%S"),
                    price = rnorm(100)
)
# format the date and time
data2$datetime <- strptime(paste(data2$date, data2$time), "%Y-%m-%d %H:%M:%S")

下一部分回答您的问题
## Round off the times to 10 second increments
data1$datetime <- data1$datetime - as.numeric(format(data1$datetime, "%S"))%%10
data2$datetime <- data2$datetime - as.numeric(format(data2$datetime, "%S"))%%10

## Aggregate the data in case there are multiple observations in one 10 second block
data1_freq <- aggregate(data1$price, list(date=as.POSIXct(data1$datetime)), mean)
data2_freq <- aggregate(data2$price, list(date=as.POSIXct(data2$datetime)), mean)

### Now merge the two data sets - not dropping any observations
data <- merge(data2_freq, data1_freq, by="date", all = TRUE)

并且您可以选择将其合并为完整的时间序列
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date = 
                               seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
                                          as.POSIXlt("2014-01-07 07:00:00"), 
                                          by = "10 secs")
)

# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)

将连续日期顺序限制为工作日和小时
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date = 
                               seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
                                          as.POSIXlt("2014-01-07 07:00:00"), 
                                          by = "10 secs")
)
# Use the lubridate package to subset the date sequence

library(lubridate)
## Use the wday function to see what day of the week it is (i.e. Monday - Friday)
cont_date_time <- cont_date_time[with(cont_date_time, wday(date)>=2&wday(date)<=6) ,]
## Use the hour function to see if it is within working hours
cont_date_time <- cont_date_time[with(cont_date_time, hour(date)>=9&hour(date)<=4) ,]

# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)

关于R - 将时间序列与不同频率对齐,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26180900/

相关文章:

r - 使用 R 中的 Ackermann 函数避免堆栈溢出

RGA 包词法错误

r - 在 R 中,将每个嵌套的数据框写入 CSV

python-3.x - 适用于 Python 的逻辑回归和 KNN 等模型的输入格式

python - python中时间序列数据的存储

r - 在 R 中 purrr::pmap 中使用带有子集的变量名的正确方法?

r - 错误概率函数

c++ - 在 C++ 中高效读取大型电子表格文件

java - 在Java中解析CSV格式字符串

python - pandas 可以在不尝试将索引转换为周期的情况下绘制时间序列吗?