将 R 中的数据集重新格式化为行作为级别,列作为范围

标签 r dataset format levels

我有一个数据集,如下所示:

partyid            coninc
Ind,Near Dem       25926
Not Str Democrat   33333
Not Str Democrat   41667
Strong Democrat    69444
Ind,Near Dem       60185
Ind,Near Dem       50926
Ind,Near Dem       18519
Strong Democrat    3704
Strong Democrat    25926
Strong Democrat    18519
Not Str Republican 18519
Strong Democrat    18519
Not Str Democrat   18519

我想要做的是将数据集格式化为如下所示:

partyid             0-50,000   50,000-100,000   100,000-150,000   >150,000
Strong Democrat     2344       3423             4342              54
Not Str Democrat    2643       934              ..
Ind, Near Dem       7656       343              ..
Ind, Near Rep       7655       833              .. 
Not Str Republican  2443       343
Strong Republican   3444       773

即按 partyid 变量的级别对行进行排序,并按 coninc 变量的范围计数对列进行排序。

我的数据的dput:

structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")

最佳答案

您可以使用 plyr 包轻松完成此操作(由于您的示例数据有点难以阅读,我删除了 partyid 中的逗号和空格):

# creating sample data
dat <- structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")

# summarising the data with plyr
require(plyr)
dat2 <- ddply(dat, .(partyid), summarise,
              zero = sum(coninc < 50001),
              fifty = sum(coninc > 50000 & coninc < 100001),
              hundred = sum(coninc > 100000 & coninc < 150001),
              hfifty = sum(coninc > 150000))

这会产生以下输出:

dat2 <- structure(list(partyid = structure(1:5, .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), zero = c(6L, 3L, 2L, 2L, 1L), fifty = c(1L, 0L, 4L, 1L, 0L), hundred = c(0L, 0L, 0L, 0L, 0L), hfifty = c(0L, 0L, 0L, 0L, 0L)), .Names = c("partyid", "zero", "fifty", "hundred", "hfifty"), row.names = c(NA, -5L), class = "data.frame")

关于将 R 中的数据集重新格式化为行作为级别,列作为范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22878280/

相关文章:

r - R中没有NA的多阵列平均

hadoop - 如何用拉丁 pig 按地区分组结果?

xml - 如何插入换行符dbunit数据集

r - 如何根据 R 中的数据框生成多个数字系列?

r - 如何用R模拟相关的二进制数据?

python - TensorFlow:将 tf.Dataset 转换为 tf.Tensor

java - CountDownTimer 格式 毫秒

c# - 如何格式化Int32数字?

r - 如何在R中仅创建时间

r - R 中的 length() 用于确定数据帧中的观察数