r - 根据月份日期向数据表添加季节列

标签 r data.table

我正在使用 data.table 并且我正在尝试创建一个名为“季节”的新列,它创建一个具有相应季节的列,例如夏季、冬季......基于名为“MonthName”的列。

我想知道是否有更有效的方法可以根据月份值将季节列添加到数据表中。

这是 300,000 个观测值中的前 6 个,假设该表称为“dt”。

    rrp         Year   Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500     1999     1    1999      00:00      33.09037       Jan
2: 21.01167     1999     1    1999      00:00      33.09037       Jan
3: 25.28667     1999     2    1999      00:00      33.09037       Feb
4: 18.42334     1999     2    1999      00:00      33.09037       Feb
5: 16.67499     1999     2    1999      00:00      33.09037       Feb
6: 18.90001     1999     2    1999      00:00      33.09037       Feb

我尝试了以下代码:
dt[, Season :=  ifelse(MonthName = c("Jun", "Jul", "Aug"),"Winter", ifelse(MonthName = c("Dec", "Jan", "Feb"), "Summer", ifelse(MonthName = c("Sep", "Oct", "Nov"), "Spring" , ifelse(MonthName = c("Mar", "Apr", "May"), "Autumn", NA))))]

返回:
 rrp totaldemand   Year Month Finyear hourminute AvgPriceByTOD MonthName Season
1: 35.27500     1999     1    1999      00:00      33.09037       Jan     NA
2: 21.01167     1999     1    1999      00:00      33.09037       Jan Summer
3: 25.28667     1999     2    1999      00:00      33.09037       Feb Summer
4: 18.42334     1999     2    1999      00:00      33.09037       Feb     NA
5: 16.67499     1999     2    1999      00:00      33.09037       Feb     NA
6: 18.90001     1999     2    1999      00:00      33.09037       Feb Summer

我收到错误:
Warning messages:
1: In MonthName == c("Jun", "Jul", "Aug") :
  longer object length is not a multiple of shorter object length
2: In MonthName == c("Dec", "Jan", "Feb") :
  longer object length is not a multiple of shorter object length
3: In MonthName == c("Sep", "Oct", "Nov") :
  longer object length is not a multiple of shorter object length
4: In MonthName == c("Mar", "Apr", "May") :
  longer object length is not a multiple of shorter object length 

除此之外,由于我不知道的原因,一些夏季月份被正确分配为“夏季”,但其他月份被分配为 NA,例如第 1 行和第 2 行都应该是夏季,但返回的方式不同。

提前致谢!

最佳答案

一种非常简单的方法是使用查找表将月份名称映射到季节:

# create a named vector where names are the month names and elements are seasons
seasons <- rep(c("winter","spring","summer","fall"), each = 3)
names(seasons) <- month.abb[c(6:12,1:5)] # thanks thelatemail for pointing out month.abb
seasons
#     Jun      Jul      Aug      Sep      Oct      Nov      Dec      Jan 
#"winter" "winter" "winter" "spring" "spring" "spring" "summer" "summer" 
#     Feb      Mar      Apr      May 
#"summer"   "fall"   "fall"   "fall" 

用它:
dt[, season := seasons[MonthName]]

数据:
dt <- setDT(read.table(text="    rrp         Year   Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500     1999     1    1999      00:00      33.09037       Jan
2: 21.01167     1999     1    1999      00:00      33.09037       Jan
3: 25.28667     1999     2    1999      00:00      33.09037       Feb
4: 18.42334     1999     2    1999      00:00      33.09037       Feb
5: 16.67499     1999     2    1999      00:00      33.09037       Feb
6: 18.90001     1999     2    1999      00:00      33.09037       Feb",
   header = TRUE, stringsAsFactors = FALSE))

关于r - 根据月份日期向数据表添加季节列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36903538/

相关文章:

r - 如何将 rJava 加载到 RStudio 中?

从手稿复制 ODE 食物网模型

根据前一个变量的值重命名变量

读取嵌入双引号和逗号的 CSV 文件

r - 使用 R 计算镜头图每个 bin 中的平均点

r - 为什么 R 中的箭头赋值在转换函数调用中不起作用?

r - 使用 data.frame 中的唯一行对另一个 data.frame 进行子集

r - 如何避免用data.table打印行号?

r - 如何从日期中获取周数?

通过从另一个变量中获取值来随机替换值