r - 计算组中一个点的密度

标签 r ggplot2 kernel-density density-plot

我正在绘制一些密度曲线,我想在每组的平均值处添加一个点。但是,我想沿着密度曲线的顶部绘制这些点,而不是在 0 处。有没有办法得出组内平均点处的密度值?代码如下:

# make df
df<- data.frame(group=c("a","b",'c'),
           value=rnorm(
             3000,
             mean=c(1,2,3),
             sd=c(1,1.5,1)
           )) 
library(tidyverse)
library(ggridges)
library(ggdist)

方式 1:来自 ggridges ppackage 的密度脊

df %>%

  # calculate mean density per group to use later
  group_by(group)%>%
  mutate(mean_value=mean(value)) %>%
    
  
  ggplot()+
  aes(x=value,y=group)+
  geom_density_ridges()+
  
  # could do with stat summary - blue points
  stat_summary(
    orientation = "y",
    fun = mean,
    geom = "point", 
    color="blue"
  )+
  
  # or could do with geom_point using precalculated value (red points)
  # nudged so we can see both. 
  geom_point(aes(x=mean_value,y=group),
             color="red",
             position = position_nudge(x=.1)
             )

方式 2:来自 ggdist 包的 stat_halfeye

df %>%
  group_by(group)%>%
  mutate(mean_value=mean(value)) %>%
  
  # mutate(mean_density = density(mean_value,value))
  
  
  ggplot()+
  aes(x=value,y=group)+
  stat_halfeye()+
  
  # could do with stat summary
  stat_summary(
    orientation = "y",
    fun = mean,
    geom = "point", 
    color="blue",
    alpha=.8
  )+
  
  # or could do with geom_point using precalculated value
  # nudged so we can see both. 
  geom_point(aes(x=mean_value,y=group),
             color="red",
             position = position_nudge(x=.1)
  )

所需输出:这些蓝色或红色点位于密度曲线的顶部。所以我需要一种类似“群体+密度值”的美学。

宁愿使用方式 2 (ggdist) 而不是 geom_密度山脊

谢谢

最佳答案

我不确定是否有办法计算 ggplot geom/stat 函数中平均值的密度曲线高度,因此我创建了几个辅助函数来执行此操作。

dens_at_mean 计算数据平均值处的密度曲线的高度。 get_mean_coords 按组运行 dens_at_mean,然后缩放高度值以匹配 stat_halfeye 生成的 y 值,并返回可以传递的数据帧到geom_point

# Reproducible data
set.seed(394)
df<- data.frame(group=c("a","b",'c'),
                value=rnorm(
                  3000,
                  mean=c(1,2,3),
                  sd=c(1,1.5,1)
                )) 

# Function to get height of density curve at mean value
dens_at_mean = function(x) { 
  d = density(x)
  mean.x = mean(x)
  data.frame(mean.x = mean.x,
             max.y = max(d$y),
             mean.y = approx(d$x, d$y, xout=mean.x)$y)
}

# Function to return data frame with properly scaled heights 
#  to plot mean points
get_mean_coords = function(data, value.var, group.var) {

  data %>% 
    group_by({{group.var}}) %>% 
    summarise(vals = list(dens_at_mean({{value.var}}))) %>% 
    ungroup %>% 
    unnest_wider(vals) %>% 
    # Scale y-value to work properly with stat_halfeye
    mutate(mean.y = (mean.y/max(max.y) * 0.9 + 1:n())) %>% 
    select(-max.y)
}

df %>%
  ggplot()+
    aes(x=value, y=group)+
    stat_halfeye() +
    geom_point(data=get_mean_coords(df, value, group), 
               aes(x=mean.x, y=mean.y),
               color="red", size=2) +
    theme_bw() +
    scale_y_discrete(expand=c(0.08,0.05))

enter image description here

关于r - 计算组中一个点的密度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65515063/

相关文章:

r - ggplot2:在两个密度图上排列 x 限制

r - 使用 grid.arrange 指定标题颜色

r - ggplot图例为什么显示 “colour”参数?

Python fastKDE 超越数据点的限制

R data.table - 如何找到一个整数值并将后续列的值相乘?

r - 使用scale_*_continuous将geom_voronoi扩展到其极限

python - 如何在 Pandas 中绘制日期的核密度图?

r - `knitr` 可以抑制 sql block 中的执行或输出吗?

r - 当草稿 : true in yaml 时由 netlify 渲染的草稿

r - 如何根据另一列的条件语句将因子列添加到数据框?