我有一个数据集,如下所示:
partyid coninc
Ind,Near Dem 25926
Not Str Democrat 33333
Not Str Democrat 41667
Strong Democrat 69444
Ind,Near Dem 60185
Ind,Near Dem 50926
Ind,Near Dem 18519
Strong Democrat 3704
Strong Democrat 25926
Strong Democrat 18519
Not Str Republican 18519
Strong Democrat 18519
Not Str Democrat 18519
我想要做的是将数据集格式化为如下所示:
partyid 0-50,000 50,000-100,000 100,000-150,000 >150,000
Strong Democrat 2344 3423 4342 54
Not Str Democrat 2643 934 ..
Ind, Near Dem 7656 343 ..
Ind, Near Rep 7655 833 ..
Not Str Republican 2443 343
Strong Republican 3444 773
即按 partyid 变量的级别对行进行排序,并按 coninc 变量的范围计数对列进行排序。
我的数据的dput
:
structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
最佳答案
您可以使用 plyr
包轻松完成此操作(由于您的示例数据有点难以阅读,我删除了 partyid
中的逗号和空格):
# creating sample data
dat <- structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
# summarising the data with plyr
require(plyr)
dat2 <- ddply(dat, .(partyid), summarise,
zero = sum(coninc < 50001),
fifty = sum(coninc > 50000 & coninc < 100001),
hundred = sum(coninc > 100000 & coninc < 150001),
hfifty = sum(coninc > 150000))
这会产生以下输出:
dat2 <- structure(list(partyid = structure(1:5, .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), zero = c(6L, 3L, 2L, 2L, 1L), fifty = c(1L, 0L, 4L, 1L, 0L), hundred = c(0L, 0L, 0L, 0L, 0L), hfifty = c(0L, 0L, 0L, 0L, 0L)), .Names = c("partyid", "zero", "fifty", "hundred", "hfifty"), row.names = c(NA, -5L), class = "data.frame")
关于将 R 中的数据集重新格式化为行作为级别,列作为范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22878280/