r - 按箱划分的密度图的颜色段

标签 r ggplot2

警告,我是 R 新手! 我有 R bug,并且尝试过各种可能性,但我却迷失了方向。我想尝试用条件“>”对密度图的分段进行着色来指示垃圾箱。在我的脑海里它看起来像:

...但不依赖于四分位数或百分比变化。

我的数据显示; x = 持续时间(天数),y = 频率。我希望绘图按 3 个月的间隔(最多 12 个月)进行颜色分割,并在之后分割一种颜色(使用工作日,即 63 = 3 个月)。

我尝试过,但真的不知道从哪里开始!

ggplot(df3, aes(x=Investigation.Duration))+
geom_density(fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>0],
           fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>63], color =     "white",
           fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>127], color = "light Grey",
           fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>190], color = "medium grey",
           fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>253], color = "dark grey",
           fill = W.S_CleanNA$Investigation.Duration[W.S_CleanNA$Investigation.Duration>506], color = "black")+

  ggtitle ("Investigation duration distribution in 'Wales' complexity sample")+
  geom_text(aes(x=175, label=paste0("Mean, 136"), y=0.0053))+
  geom_vline(xintercept = c(136.5), color = "red")+
  geom_text(aes(x=80, label=paste0("Median, 129"), y=0.0053))+
  geom_vline(xintercept = c(129.5), color = "blue")

非常感谢任何非常简单的帮助。

最佳答案

不幸的是,您不能直接使用geom_密度来执行此操作,因为它“在幕后”是用单个多边形构建的,并且多边形只能有单个填充。做到这一点的唯一方法是拥有多个多边形,并且您需要自己构建它们。

幸运的是,这比听起来容易。

问题中没有样本数据,因此我们将创建一个具有相同中位数和均值的合理分布:

#> Simulate data
set.seed(69)
df3 <- data.frame(Investigation.Duration = rgamma(1000, 5, 1/27.7))

round(median(df3$Investigation.Duration))
#> [1] 129
round(mean(df3$Investigation.Duration))
#> [1] 136

# Get the density as a data frame
dens <- density(df3$Investigation.Duration)
dens <- data.frame(x = dens$x, y = dens$y)

# Exclude the artefactual times below zero
dens <- dens[dens$x > 0, ]

# Split into bands of 3 months and group > 12 months together
dens$band <- dens$x %/% 63
dens$band[dens$band > 3] <- 4

# This us the complex bit. For each band we want to add a point on
# the x axis at the upper and lower ltime imits:
dens <- do.call("rbind", lapply(split(dens, dens$band), function(df) {
  df <- rbind(df[1,], df, df[nrow(df),])
  df$y[c(1, nrow(df))] <- 0
  df
}))

现在我们有了多边形,这只是适当绘制和标记的情况:

library(ggplot2)

ggplot(dens, aes(x, y)) + 
  geom_polygon(aes(fill = factor(band), color = factor(band))) +
  theme_minimal() +
  scale_fill_manual(values = c("#003f5c", "#58508d", "#bc5090",
                               "#ff6361", "#ffa600"), 
                    name = "Time",
                    labels = c("Less than 3 months",
                               "3 to 6 months",
                               "6 to 9 months",
                               "9 to 12 months",
                               "Over 12 months")) +
  scale_colour_manual(values = c("#003f5c", "#58508d", "#bc5090",
                               "#ff6361", "#ffa600"), 
                      guide = guide_none()) +
  labs(x = "Days since investigation started", y = "Density") +
  ggtitle ("Investigation duration distribution in 'Wales' complexity sample") +
  geom_text(aes(x = 175, label = paste0("Mean, 136"), y = 0.0053),
            check_overlap = TRUE)+
  geom_vline(xintercept = c(136.5), linetype = 2)+
  geom_text(aes(x = 80, label = paste0("Median, 129"), y = 0.0053),
            check_overlap = TRUE)+
  geom_vline(xintercept = c(129.5), linetype = 2)

enter image description here

关于r - 按箱划分的密度图的颜色段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63289154/

相关文章:

r - 如何获取图中强连通分量的边列表?

R - 函数重载

r - ggplot 无法绘制最小点

ggplot2 - 使用pyspark+databricks时如何绘制相关热图

r - ggplot : how to specify vertical order of multiple boxplots?

r - ggplot线型导致线不可见

r - 在 R 中子集或排列数据

r - 对多个条件使用 if else 语句

R在data.table中查找间隔

r - ggplot - 多个箱线图