r - dplyr 不按日期对数据进行分组

标签 r dplyr strptime

我正在尝试使用 Leada 提供的数据集来计算人们骑自行车的频率。

这是代码:

library(dplyr)

setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
              colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
              stringsAsFactors = T)
names(d)[9] <- "BikeNo"

d <- tbl_df(d)

d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday) 
  %>% summarise(Total = n()) 
  %>% select(Weekday, Total)

很奇怪,但 dplyr 不想按工作日对数据进行分组:

Error: column 'Start.Date' has unsupported type

为什么它关心我按因子分组的 Start.Date 列? 您可以在本地运行代码来重现错误:它会自动下载数据。

附注我使用的 dplyr 版本:dplyr_0.3.0.2

最佳答案

lubridate 包在处理日期时很有用。 以下是解析 Start.Date 和 End.Date、提取工作日,然后按工作日分组的代码:

将日期读取为字符向量

library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)

names(d)[9] <- "BikeNo"
d <- tbl_df(d)

使用 lubridate 转换开始日期和结束日期

d <- d %>% 
  mutate(
    Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
    End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
    Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))

每周的行数

d %>%
  group_by(Weekday) %>%
  summarise(Total = n())

#     Weekday Total
# 1    Sunday 10587
# 2    Monday 23138
# 3   Tuesday 24678
# 4 Wednesday 23651
# 5  Thursday 25265
# 6    Friday 24283
# 7  Saturday 12413

关于r - dplyr 不按日期对数据进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27828850/

相关文章:

Python strptime - 处理个位数日期

r - 在 data.table 中为多个类别的每个组合添加缺失值

r - 从R中的 "rstanarm"包获取标准化系数?

r - 如何删除R中的重复行?

python - datetime.strptime 奇怪的行为

python - 有没有办法改变 strptime() 的阈值?

R Plot_ly 列出在 .Rmd R-Studio 运行模式下绘制多个图,但在针织时不绘制

r - Highchart 修正瀑布图

r - 如何在 R 中使用 Dataframe 创建所需的矩阵

r - 我们如何立即将 tidyr::spread() 应用于所有分类变量,为每个分类变量的每个级别创建新列?