r - 使用饼图比较分类变量的聚类分布和总体分布

在聚类分析的背景下，我尝试可视化每个聚类相对于总体群体的分类变量分布。

为了使它们具有可比性，我使用相对频率。

对于数值变量来说非常简单，因为我可以轻松地覆盖直方图。

相反，对于分类变量我想获得这样的东西:

其中外部饼图可视化集群 1 的相对频率，内部饼图表示总体的相对频率人口。

一个可重现的例子是:

mydf <- data.frame(week_day = as.factor(c(rep("monday",10), rep("monday",5), rep("tuesday",5))), cluster = c(rep(1,10), rep(2,10)))

此处，集群 1 完全由“星期一”组成，而总体人口 则由 75% 的“星期一”和 25%“星期二”。

可以使用以下方法轻松计算 ggplot aes 中的相对频率:

y = (..count..)/sum(..count..)

最佳答案

假设您正在查看一个具有 4 个类别 A B C D 的变量，并且您有这种类型的数据框。

d <- tribble(~Category, ~Overall, ~Cluster1,
         "A", 250, 20,
         "B", 250, 110,
         "C", 250, 30,
         "D", 250, 40) %>%
gather(Overall, Cluster1, key = "Cluster", value = "Count")

这意味着:“整个数据集，250 个点属于 A 类，250 个点属于 B 类，等等。在 Cluster1 中，20 个点属于 A 类，110 个点属于 B 类，等等。

ggplot 假设饼图是用极坐标绘制的(缩放的)条形图。

要获取具有相对频率的条形图，请在 geom_bar 中指定 position = "fill" 参数

ggplot(data = d) +
geom_bar(stat = "identity",
         position = "fill", #automatically scales the bars form 0 to 1, necessary for polar corrdinates
         aes(x = Cluster, y = Count, fill = Category))

它为您提供了以下图表: Bar chart with relatives frequences

然后，您需要切换到极坐标，并指定 y 轴作为角度参数。径向参数将是您的集群/整体分布。

您应该注意因子水平的顺序，以便您在圆圈中间得到正确的东西(此处:总体分布)。我的示例解决方案并不意味着是最佳的:

d$Cluster <- factor(d$Cluster, levels = c("Overall","Cluster1"))
#`Overall` has the lowest factor index to be displayed

然后，添加 coord_polar 图层:

ggplot(data = d) +
geom_bar(stat = "identity",
         position = "fill", #automatically scales the bars form 0 to 1, necessary for polar corrdinates
         aes(x = Cluster, y = Count, fill = Category),
         width = .9) + #play with the width of the bins for the blank space between the circles. 1 = no blank space
coord_polar(theta = "y") +#the y coordinated becomes the angular parameter
theme(axis.text.y = element_blank()) #I didn't look for a fancy way to display radial labels

这给你:

Pie chart with relative frequences

关于r - 使用饼图比较分类变量的聚类分布和总体分布，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48278648/

r - 使用饼图比较分类变量的聚类分布和总体分布

上一篇：php - Symfony 和 FOSUserBundle。自定义注册模板上的验证错误

下一篇：sql - 在 SQL Server 中，按日期排序不按降序显示所有日期