我有两个 .csv 文件,其中包含底部给出的两个单独的时间序列。我可以将这些作为数据框导入 R:
data1 <- read.csv(data1.csv)
data2 <- read.csv(data2.csv)
我有
date
, time
, 和 price
每个数据帧中的信息。 我想调整 data1
的价格和 data2
再加上单表表中的常见频率为 10 秒。 我有两个时间序列的开始和结束日期和时间,但是频率(因此每天的观察次数)不同,每天的开始和结束时间也不同
我厌倦了使用
ts()
但我不认为这个功能可以同时使用日期和时间。将这些时间序列与共同频率对齐的最有效方法是什么?
数据1.csv:
date,time,price
01/06/2014,05:59:42,1954.75
01/06/2014,06:00:05,1954.875
01/06/2014,06:00:06,1954.75
01/06/2014,06:00:08,1954.875
01/06/2014,06:02:05,1954.625
01/06/2014,06:02:22,1954.875
01/06/2014,06:03:12,1954.75
01/06/2014,06:03:14,1954.625
01/06/2014,06:03:20,1954.75
01/06/2014,06:03:22,1954.875
01/06/2014,06:03:23,1954.75
01/06/2014,06:03:26,1954.875
01/06/2014,06:07:07,1955.125
01/06/2014,06:07:21,1954.875
01/06/2014,06:08:54,1954.625
01/06/2014,06:16:55,1954.375
01/06/2014,06:17:00,1954.625
01/06/2014,06:21:46,1954.875
01/06/2014,06:28:11,1955.125
01/06/2014,06:30:23,1955.375
01/06/2014,06:30:49,1955.125
01/06/2014,06:33:33,1955.375
01/06/2014,06:34:30,1955.125
01/06/2014,06:37:39,1955.375
01/06/2014,06:37:43,1955.125
01/06/2014,06:47:42,1954.875
01/06/2014,06:50:23,1955.125
01/06/2014,06:57:10,1954.875
01/06/2014,06:57:12,1955.125
01/06/2014,07:00:08,1954.875
01/06/2014,07:00:21,1955.125
01/06/2014,07:00:55,1955.375
01/06/2014,07:01:19,1955.125
01/06/2014,07:01:51,1955.375
02/06/2014,05:59:50,1966.625
02/06/2014,06:00:00,1966.375
02/06/2014,06:00:07,1966.5
02/06/2014,06:00:08,1966.625
02/06/2014,06:00:10,1966.375
02/06/2014,06:00:33,1966.125
02/06/2014,06:00:34,1966.375
02/06/2014,06:00:41,1966.125
02/06/2014,06:00:48,1966.375
02/06/2014,06:02:48,1966.625
02/06/2014,06:03:24,1966.875
02/06/2014,06:04:23,1967.125
02/06/2014,06:04:39,1966.875
02/06/2014,06:05:28,1966.625
02/06/2014,06:06:25,1966.375
02/06/2014,06:07:44,1966.625
数据2.csv:
date,time,price
01/06/2014,02:05:25,0
01/06/2014,06:00:07,3231.5
01/06/2014,06:00:17,3232.5
01/06/2014,06:00:19,3231.5
01/06/2014,06:00:33,3232.5
01/06/2014,06:00:40,3231.5
01/06/2014,06:00:41,3232.5
01/06/2014,06:00:42,3231.5
01/06/2014,06:00:44,3232.5
01/06/2014,06:04:06,3233.5
01/06/2014,06:04:22,3232.5
01/06/2014,06:04:42,3233.5
01/06/2014,06:08:48,3232.5
01/06/2014,06:10:12,3231.5
01/06/2014,06:10:35,3232.5
01/06/2014,06:21:45,3233.5
01/06/2014,06:21:55,3234.5
01/06/2014,06:29:00,3235.5
01/06/2014,06:33:34,3236.5
01/06/2014,06:34:30,3235.5
01/06/2014,06:41:33,3234.5
01/06/2014,06:47:42,3233.5
01/06/2014,06:48:33,3234.5
01/06/2014,06:50:23,3235.5
01/06/2014,06:52:04,3236.5
01/06/2014,06:57:11,3235.5
01/06/2014,07:00:00,3236.5
01/06/2014,07:00:06,3235.5
01/06/2014,07:00:08,3233.5
01/06/2014,07:00:09,3234.5
01/06/2014,07:00:10,3233.5
01/06/2014,07:00:11,3234.5
01/06/2014,07:00:21,3235.5
02/06/2014,06:00:10,3252.5
02/06/2014,06:00:20,3252
02/06/2014,06:00:21,3251.5
02/06/2014,06:00:33,3250.5
02/06/2014,06:00:34,3251
02/06/2014,06:00:35,3250.5
02/06/2014,06:00:41,3249.5
02/06/2014,06:01:31,3250.5
02/06/2014,06:01:32,3249.5
02/06/2014,06:01:38,3250.5
02/06/2014,06:02:47,3251.5
02/06/2014,06:05:32,3250.5
02/06/2014,06:06:25,3249.5
02/06/2014,06:07:44,3250.5
02/06/2014,06:08:11,3249.5
02/06/2014,06:12:32,3250.5
02/06/2014,06:16:56,3251.5
02/06/2014,06:17:08,3250.5
02/06/2014,06:18:32,3251.5
02/06/2014,06:31:59,3250.5
02/06/2014,06:32:11,3251.5
02/06/2014,06:44:47,3250.5
02/06/2014,06:45:09,3251.5
02/06/2014,06:52:33,3252.5
02/06/2014,06:52:36,3253.5
02/06/2014,06:55:30,3254.5
02/06/2014,06:55:39,3253.5
02/06/2014,06:57:27,3254.5
02/06/2014,07:00:01,3253.5
02/06/2014,07:00:02,3254.5
02/06/2014,07:00:17,3253.5
02/06/2014,07:00:23,3252.5
这就是数据框“data1”的样子:
date time Price
1 2014-06-01 06:03:59.614000 62.1250
2 2014-06-01 06:03:59.692000 62.2500
3 2014-06-01 06:15:42.004000 62.2375
4 2014-06-01 06:15:42.083000 61.9250
5 2014-06-01 06:17:01.654000 61.9125
6 2014-06-01 06:17:01.732000 61.9000
7 2014-06-01 06:23:41.908000 61.8200
8 2014-06-01 06:23:41.986000 61.8570
9 2014-06-01 06:23:55.211000 61.9065
10 2014-06-01 06:23:55.291000 61.8725
11 2014-06-01 06:24:11.679000 61.8715
最佳答案
示例数据集
date_time <- seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), as.POSIXlt("2014-01-07 07:00:00"), by = "1 secs")
date_time_1 <- sample(date_time, 100)
date_time_2 <- sample(date_time, 100)
data1 <- data.frame(date=as.Date(date_time_2),
time = format(date_time_1, "%H:%M:%S"),
price = rnorm(100)
)
# format the date and time
data1$datetime <- strptime(paste(data1$date, data1$time), "%Y-%m-%d %H:%M:%S")
data2 <- data.frame(date=as.Date(date_time_2),
time = format(date_time_1, "%H:%M:%S"),
price = rnorm(100)
)
# format the date and time
data2$datetime <- strptime(paste(data2$date, data2$time), "%Y-%m-%d %H:%M:%S")
下一部分回答您的问题
## Round off the times to 10 second increments
data1$datetime <- data1$datetime - as.numeric(format(data1$datetime, "%S"))%%10
data2$datetime <- data2$datetime - as.numeric(format(data2$datetime, "%S"))%%10
## Aggregate the data in case there are multiple observations in one 10 second block
data1_freq <- aggregate(data1$price, list(date=as.POSIXct(data1$datetime)), mean)
data2_freq <- aggregate(data2$price, list(date=as.POSIXct(data2$datetime)), mean)
### Now merge the two data sets - not dropping any observations
data <- merge(data2_freq, data1_freq, by="date", all = TRUE)
并且您可以选择将其合并为完整的时间序列
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date =
seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"),
as.POSIXlt("2014-01-07 07:00:00"),
by = "10 secs")
)
# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)
将连续日期顺序限制为工作日和小时
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date =
seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"),
as.POSIXlt("2014-01-07 07:00:00"),
by = "10 secs")
)
# Use the lubridate package to subset the date sequence
library(lubridate)
## Use the wday function to see what day of the week it is (i.e. Monday - Friday)
cont_date_time <- cont_date_time[with(cont_date_time, wday(date)>=2&wday(date)<=6) ,]
## Use the hour function to see if it is within working hours
cont_date_time <- cont_date_time[with(cont_date_time, hour(date)>=9&hour(date)<=4) ,]
# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)
关于R - 将时间序列与不同频率对齐,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26180900/