条件分布的 R 图。 cdplot() 似乎没有这样做

标签 r plot density-plot

我有一个具有以下结构的数据集:

> data("household", package="HSAUR2")
> household[c(1,5,10,30,40),]
   housing food goods service gender total
1      820  114   183     154 female  1271
5      721   83   176     104 female  1084
10     845   64  1935     414 female  3258
30    1641  440  6471    2063   male 10615
40    1524  964  1739    1410   male  5637

“总计”列是前四列的总和。这是一个家庭支出,分为四类。

现在,如果我想要性别与总支出的条件密度图,我可以:

cdplot(gender ~ total, data=household)

我会得到这个图像:

enter image description here

我想要同一张图片,其中 x 轴为“总”支出,但 y 轴为四个类别(住房、食品、商品、服务)的条件分布。我只能想到一个非常肮脏的黑客,我生成一个因子,并且对于第一个数据行,我重复“住房”820 次,然后重复“食物”114 次,等等。

一定有更简单的方法,对吧?

最佳答案

正如我所说,您使用了错误的工具来获得您想要的东西。您正在设想一个无法直接从数据中获得的图(见底部)。

相反,您需要对数据进行建模。具体来说,您希望预测每个类别中支出的预期部分作为总支出的函数。然后,您设想的图显示该模型的拟合值(即任何区域的支出的预测比例)作为总支出的函数。下面是一些使用 loess 曲线执行此操作的代码。我绘制了原始数据和拟合值,向您展示发生了什么。

# setup the data
data("household", package = "HSAUR2")
household$total <- rowSums(household[,1:4])
household <- within(household, {
    housing <- housing/total
    food <- food/total
    goods <- goods/total
    service <- service/total
})

# estimate loess curves
l_list <-
list(loess(housing ~ total, data = household),
     loess(food ~ total, data = household),
     loess(goods ~ total, data = household),
     loess(service ~ total, data = household))

# stack fitted curves on top of one another
ndat <- data.frame(total = seq(min(household$total), max(household$total), 100))
p <- lapply(l_list, predict, newdata = ndat)
for(i in 2:length(l_list))
    p[[i]] <- p[[i]] + p[[i-1]]

# plot
plot(NA, xlim=range(household$total), ylim = c(0,1), xlab='Total', ylab='Percent', las=1, xaxs='i')
# plot dots
with(household, points(total, housing, pch = 20, col = palette()[1]))
with(household, points(total, housing + food, pch = 20, col = palette()[2]))
with(household, points(total, housing + food + goods, pch = 20, col = palette()[3]))
with(household, points(total, housing + food + goods + service, pch = 20, col = palette()[4]))
# plot fitted lines
for(i in 1:length(p))
    lines(ndat$total, p[[i]], type = 'l', lwd = 2, col = palette()[i])

结果:

enter image description here

如果您尝试根据原始数据创建这样的图,它看起来会有点奇怪,但也许这就是您想要的:

plot(NA, xlim=range(household$total), ylim = c(0,1), xlab='Total', ylab='Percent', las=1, xaxs='i')
with(household, lines(total[order(total)], housing[order(total)], pch = 20, col = palette()[1]))
with(household, lines(total[order(total)], (housing + food)[order(total)], pch = 20, col = palette()[2]))
with(household, lines(total[order(total)], (housing + food + goods)[order(total)], pch = 20, col = palette()[3]))
with(household, lines(total[order(total)], (housing + food + goods + service)[order(total)], pch = 20, col = palette()[4]))

结果:

enter image description here

关于条件分布的 R 图。 cdplot() 似乎没有这样做,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26840509/

相关文章:

python - 用Python绘制密度图,用贝塞尔积分制作衍射图案,但它不会停止运行

r - 延长 ggplot2 中密度图的尾部

r - 如何将变量名称的字符向量转换为在 R 中以逗号分隔的列表

python - Python 和 Matplotlib 中的垂直直方图

r - 将值 append 到 R 中的空向量?

python - Matplotlib.pyplot 标签不在标签中显示印地语文本,

r - 绘制手段、错误,然后在后台绘制原始数据 - 更简单的代码?

r - 在密度分布顶部绘制中值

R : adding the values in a [row, column] 仅当值在同一列的两行中为真时

在 R 中重构数据框架