我正在尝试计算最后 3M 值的平均值并将它们添加到数据框的底部,然后使用这些值计算 3M 的平均值(基本上是 2 个月的数据加上新添加的平均值)并重复此18次。
我正在尝试找到一种有效的方法来做到这一点,这样耗时更少。我厌倦了用双循环来做这件事,但后来找到了一种使用一个循环和lapply()
的方法。
但我想知道是否有更好的方法来避免循环。
library(dplyr)
library(forecast)
library(readxl)
library(data.table)
library(clock)
library(lubridate)
library(tsibble)
df <- read_excel("C:/X/X/X- X/X/Book7.xlsx",sheet = "Loop")
freq = 18
colnames(df)[1]="Dates"
Dates <- df$Dates
Working <- df[,-1]
#--------------------------------------- Creation of Functions ---------------------------------------#
Moving_Average_3M <- function(Working)
{
last_3_row <- tail(Working,3)
# Convert the `last_3_row` object to a two-dimensional object as tail() function returns a vector
last_3_row_df <- data.frame(last_3_row)
# Calculate the mean of the last three rows
mean_last_3 <- data.frame(colMeans(last_3_row_df,na.rm = TRUE))
return(mean_last_3)
}
Rename_Col_and_bind <- function(Working,Output)
{
colnames(Output) <- colnames(Working)
Working <- rbind(Working,Output)
return(Working)
}
#--------------------------------------- End of Creation of Functions ---------------------------------------#
#------------------------------------------ Loops for Execution ---------------------------------------------#
for(i in 1:freq)
{
Output <- data.frame(lapply(Working,Moving_Average_3M))
Working <- Rename_Col_and_bind(Working,Output)
}
view(Output)
我正在使用的数据框如下:
structure(list(Dates = c("2019-01-01", "2019-02-01", "2019-03-01",
"2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01",
"2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01",
"2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01",
"2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01",
"2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01",
"2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01",
"2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01",
"2022-08-01", "2022-09-01", "2022-10-01"), `XYZ|851` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 206, 1814, 2324, 772, 1116, 1636, 1906,
957, 829, 911, 786, 938, 1313, 2384, 1554, 1777, 1635, 1534,
1015, 827, 982, 685, 767, 511, 239, 5400, 1301, 426, 261, 201,
33, 27, 28, 46, 11, 55, 47), `XYZ|574` = c(0, 0, 0, 0, 0, 0,
0, 0, 74, 179, 464, 880, 324, 184, 90, 170, 140, 96, 78, 83,
83, 121, 245, 9000, 332, 123, 117, 138, 20, 42, 70, 70, 42, 103,
490, 7500, 488, 245, 142, 95, 63, 343, 57, 113, 100, 105)), row.names = c(NA,
-46L), class = c("tbl_df", "tbl", "data.frame"))
如上所述,两次迭代后的简约输出如下: 这是此处用于获取两次迭代的循环:
for(i in 1:2)
{
Output <- data.frame(lapply(Working,Moving_Average_3M))
Working <- Rename_Col_and_bind(Working,Output)
}
Working
数据框的输出如下:
structure(list(`XYZ|851` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 206,
1814, 2324, 772, 1116, 1636, 1906, 957, 829, 911, 786, 938, 1313,
2384, 1554, 1777, 1635, 1534, 1015, 827, 982, 685, 767, 511,
239, 5400, 1301, 426, 261, 201, 33, 27, 28, 46, 11, 55, 47, 37.6666666666667,
46.5555555555556), `XYZ|574` = c(0, 0, 0, 0, 0, 0, 0, 0, 74,
179, 464, 880, 324, 184, 90, 170, 140, 96, 78, 83, 83, 121, 245,
9000, 332, 123, 117, 138, 20, 42, 70, 70, 42, 103, 490, 7500,
488, 245, 142, 95, 63, 343, 57, 113, 100, 105, 106, 103.666666666667
)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31",
"32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42",
"43", "44", "45", "46", "last_3_row", "last_3_row1"), class = c("tbl_df",
"tbl", "data.frame"))
为了进一步解释这一点,为了清晰起见,我添加了一个 Excel 图像:
蓝色图像是输出,与您在 Working
数据框中看到的输出相同,并且使用的公式以黄色突出显示。
最佳答案
决议前的一些想法:
- 这是一个累积平均值,因此简单的矢量化计算不起作用
- 这不是一个滚动操作
- 这是一种归约(ala
Reduce
或purrr::reduce
),因为一个值的计算依赖于行(并且计算)之前;它更像是一种递归方法,尽管我们不会为此明确使用递归 - 旁注:向对象迭代添加 (
rbind
) 行在概念上是可行的,但效率极低且扩展性很差;因此,我将预分配一次空间(用NA
填充)并用新值填充行,而不是在每次迭代中进行 rbind
# preallocate the extra rows
Working2 <- rbind(Working, Working[1:18,][NA,])
for (i in (nrow(Working)+1):nrow(Working2))
Working2[i,-1] <- lapply(Working2[i - 1:3,-1], mean)
as.data.frame(Working2)
# Dates XYZ|851 XYZ|574
# 1 2019-01-01 0.00000 0.0000
# 2 2019-02-01 0.00000 0.0000
# 3 2019-03-01 0.00000 0.0000
# 4 2019-04-01 0.00000 0.0000
# 5 2019-05-01 0.00000 0.0000
# 6 2019-06-01 0.00000 0.0000
# 7 2019-07-01 0.00000 0.0000
# 8 2019-08-01 0.00000 0.0000
# 9 2019-09-01 0.00000 74.0000
# 10 2019-10-01 206.00000 179.0000
# 11 2019-11-01 1814.00000 464.0000
# 12 2019-12-01 2324.00000 880.0000
# 13 2020-01-01 772.00000 324.0000
# 14 2020-02-01 1116.00000 184.0000
# 15 2020-03-01 1636.00000 90.0000
# 16 2020-04-01 1906.00000 170.0000
# 17 2020-05-01 957.00000 140.0000
# 18 2020-06-01 829.00000 96.0000
# 19 2020-07-01 911.00000 78.0000
# 20 2020-08-01 786.00000 83.0000
# 21 2020-09-01 938.00000 83.0000
# 22 2020-10-01 1313.00000 121.0000
# 23 2020-11-01 2384.00000 245.0000
# 24 2020-12-01 1554.00000 9000.0000
# 25 2021-01-01 1777.00000 332.0000
# 26 2021-02-01 1635.00000 123.0000
# 27 2021-03-01 1534.00000 117.0000
# 28 2021-04-01 1015.00000 138.0000
# 29 2021-05-01 827.00000 20.0000
# 30 2021-06-01 982.00000 42.0000
# 31 2021-07-01 685.00000 70.0000
# 32 2021-08-01 767.00000 70.0000
# 33 2021-09-01 511.00000 42.0000
# 34 2021-10-01 239.00000 103.0000
# 35 2021-11-01 5400.00000 490.0000
# 36 2021-12-01 1301.00000 7500.0000
# 37 2022-01-01 426.00000 488.0000
# 38 2022-02-01 261.00000 245.0000
# 39 2022-03-01 201.00000 142.0000
# 40 2022-04-01 33.00000 95.0000
# 41 2022-05-01 27.00000 63.0000
# 42 2022-06-01 28.00000 343.0000
# 43 2022-07-01 46.00000 57.0000
# 44 2022-08-01 11.00000 113.0000
# 45 2022-09-01 55.00000 100.0000
# 46 2022-10-01 47.00000 105.0000
# 47 <NA> 37.66667 106.0000
# 48 <NA> 46.55556 103.6667
# 49 <NA> 43.74074 104.8889
# 50 <NA> 42.65432 104.8519
# 51 <NA> 44.31687 104.4691
# 52 <NA> 43.57064 104.7366
# 53 <NA> 43.51395 104.6859
# 54 <NA> 43.80049 104.6305
# 55 <NA> 43.62836 104.6843
# 56 <NA> 43.64760 104.6669
# 57 <NA> 43.69215 104.6606
# 58 <NA> 43.65604 104.6706
# 59 <NA> 43.66526 104.6660
# 60 <NA> 43.67115 104.6658
# 61 <NA> 43.66415 104.6675
# 62 <NA> 43.66685 104.6664
# 63 <NA> 43.66738 104.6666
# 64 <NA> 43.66613 104.6668
然后您可以根据需要填写日期
。
(我使用 as.data.frame(Working2)
只是为了显示所有小数,因为 tibble
的 print 方法经常隐藏一些精度。)
关于r - 计算数据帧最后 3M 值的平均值并将它们添加到数据帧中,重复 18 次,无需在 R 中使用循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77188481/