r - ggplot2 : Extending stat_function to the geom_violin

在 data.frame 中，我希望能够将 ggplot2::geom_violin() 的密度估计值与使用 stat_function() 计算的密度估计值以及每个因素的密度估计值进行比较。

在此设置中，我想将 2 个大小为 100 的样本的经验密度与平均值为 10 和 20 的正态分布的真实密度进行比较。


library(tidyverse)

test <- tibble(a = rnorm(100, mean = 10), 
               b = rnorm(100, mean = 20)) %>% 
  gather(key, value)

实现此目的的一种方法是为每个因子复制 stat_密度和 stat_function 的叠加。然而，由于太多因素，这会产生太多情节。 (这些问题存在多个答案:例如 overlay histogram with empirical density and dnorm function )

为了使下图清晰，我使用@DavidRobinson的geom_flat_violin:dgrtwo/ geom_flat_violin.R .

source("geom_flat_violin.R")

# without the "true" distribution

test %>% 
  ggplot(aes(x = key, y = value)) +
  geom_flat_violin(col = "red", fill = "red", alpha = 0.3) + 
  geom_point()

# comparing with the "true" distribution

test %>% 
  ggplot(aes(x = key, y = value)) +
  geom_flat_violin(col = "red", fill = "red", alpha = 0.3) + 
  geom_point() +
  geom_flat_violin(data = tibble(value = rnorm(10000, mean = 10), key = "a"),
                   fill = "blue", alpha = 0.2)

该解决方案的问题在于，它需要为每个因素模拟足够的模拟数据点，以便最终的密度足够平滑。对于正态分布，10000 就足够了，但对于其他分布，可能需要模拟更多点。

问题是:是否可以使用 stat_functions 来实现此目的，以便不必模拟数据？

  stat_function(fun = dnorm, args = list(mean = 10))
  stat_function(fun = dnorm, args = list(mean = 20))

最佳答案

您无需计算大样本的密度，只需直接获取分布并将其绘制为多边形即可:

library(tidyverse)

test <- tibble(a = rnorm(100, mean = 10), 
               b = rnorm(100, mean = 20)) %>% 
  gather(key, value) 

test %>%
  ggplot(aes(x = key, y = value)) +
  geom_flat_violin(col = "red", fill = "red", alpha = 0.3) + 
  geom_point() +
  geom_polygon(data = tibble(value = seq(7, 13, length.out = 100), 
                             key = 1 + dnorm(value, 10)),
               fill = "blue", colour = "blue", alpha = 0.2)

关于r - ggplot2 : Extending stat_function to the geom_violin，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63088998/

r - ggplot2 : Extending stat_function to the geom_violin

上一篇：android - 尝试学习 Appium 时在 Android Studio 中复制类

下一篇：python - 通过 Python 的单个连接进行多个请求