r - 我想根据同一数据框中其他列的条件从 R 数据框中的列生成 8 种名称组合

标签 r dataframe combinations

我有一个数据框,其中包含来自 4 个不同球队的 20 名球员(每队 5 名球员),每个球员都从幻想选秀中获得了薪水。我希望能够创建薪水等于或小于 10000 且总分大于 x 的 8 名球员的所有组合,但不包括包含来自同一球队的 4 名或更多球员的任何组合。

这是我的数据框的样子:

       Team      Player    K   D    A    LH Points Salary    PPS
  4     ATN  ExoticDeer  6.1 3.3  6.4 306.9 22.209   1622 1.3692
  2     ATN     Supreme  6.8 5.3  7.1 229.4 21.954   1578 1.3913
  1     ATN        sasu  3.6 6.4 11.0  95.7 19.357   1244 1.5560
  3     ATN eL lisasH 2  2.6 6.1  7.9  29.7 12.037    998 1.2061
  5     ATN       Nisha  2.7 5.6  7.5  48.2 12.282    955 1.2861
  11     CL Swiftending  6.0 5.8  7.8 360.5 22.285   1606 1.3876
  13     CL     Pajkatt 13.3 7.5  9.3 326.8 37.248   1489 2.5015
  15     CL  SexyBamboe  6.3 8.5  9.3 168.0 20.660   1256 1.6449
  14     CL         EGM  2.8 6.0 13.5  78.8 21.988    989 2.2233
  12     CL       Saksa  2.5 6.5 10.5  59.8 15.898    967 1.6441
  51 DBEARS         Ace  7.0 3.4  6.9 195.6 23.596   1578 1.4953
  31 DBEARS    HesteJoe  5.4 5.4  6.1 176.7 16.927   1512 1.1195
  61 DBEARS      Miggel  2.8 6.8 11.0 141.8 17.818   1212 1.4701
  21 DBEARS        Noia  3.0 6.0  8.0  36.1 13.161    970 1.3568
  41 DBEARS        Ryze  2.7 4.7  6.7  74.6 12.166    937 1.2984
  8      GB Keyser Soze  6.0 5.0  5.6 316.0 19.120   1602 1.1935
  9      GB      Madara  5.4 5.3  6.6 334.5 19.405   1577 1.2305
  10     GB     SkyLark  1.8 5.3  7.0  71.8 10.218   1266 0.8071
  7      GB         MNT  2.3 5.9  6.1  85.6  9.316   1007 0.9251
  6      GB   SKANKS224  1.4 7.6  7.4  52.5  7.565    954 0.7930

我遵循这篇文章中描述的一般概念:I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less

调整代码以满足我的需要。这是我目前所拥有的:

## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn,     8))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
    if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)

使用上面的代码,我能够生成所有可能的 8 名球员的阵容,然后根据各种标准(总薪水和积分数)对其进行子集化,但是在排除超过 8 名球员的阵容时我很费力同一队的 3 名球员。

我想阵容需要从 newdf 中排除,但我真的不知道从哪里开始。

这里是输出结果:

structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL", 
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS", 
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L, 
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L, 
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer", 
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224", 
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending", 
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3, 
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3, 
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6, 
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8, 
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1, 
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168, 
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8, 
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282, 
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818, 
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622, 
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512, 
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692, 
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233, 
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305, 
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D", 
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4", 
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31", 
"61", "21", "41", "8", "9", "10", "7", "6"))

最佳答案

最好以长格式构建它,我认为:

组建团队

library(data.table)
setDT(FantasyPlayers)

xx    <- combn(as.character(FantasyPlayers$Player), 8)
mxx   <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))

head(mxx,10)
#     jersey_no team_no      Player
#  1:         1       1  ExoticDeer
#  2:         2       1     Supreme
#  3:         3       1        sasu
#  4:         4       1 eL lisasH 2
#  5:         5       1       Nisha
#  6:         6       1 Swiftending
#  7:         7       1     Pajkatt
#  8:         8       1  SexyBamboe
#  9:         1       2  ExoticDeer
# 10:         2       2     Supreme

8 人一组共享 team_no并由其索引 jersey_no .看?melt.array看看这是如何工作的。 setDT只需将生成的 data.frame 转换为 data.table 以便于合并。

合并恢复 Player属性

FantasyTeams <- FantasyPlayers[mxx, on="Player"]

#          Team      Player   K   D    A    LH Points Salary    PPS jersey_no team_no
#       1:  ATN  ExoticDeer 6.1 3.3  6.4 306.9 22.209   1622 1.3692         1       1
#       2:  ATN     Supreme 6.8 5.3  7.1 229.4 21.954   1578 1.3913         2       1
#       3:  ATN        sasu 3.6 6.4 11.0  95.7 19.357   1244 1.5560         3       1
#       4:  ATN eL lisasH 2 2.6 6.1  7.9  29.7 12.037    998 1.2061         4       1
#       5:  ATN       Nisha 2.7 5.6  7.5  48.2 12.282    955 1.2861         5       1
#      ---                                                                           
# 1007756:   GB Keyser Soze 6.0 5.0  5.6 316.0 19.120   1602 1.1935         4  125970
# 1007757:   GB      Madara 5.4 5.3  6.6 334.5 19.405   1577 1.2305         5  125970
# 1007758:   GB     SkyLark 1.8 5.3  7.0  71.8 10.218   1266 0.8071         6  125970
# 1007759:   GB         MNT 2.3 5.9  6.1  85.6  9.316   1007 0.9251         7  125970
# 1007760:   GB   SKANKS224 1.4 7.6  7.4  52.5  7.565    954 0.7930         8  125970

默认情况下,仅打印 data.table 的第一行和最后几行。要检查整个事情,请尝试 ?View或查看 ?print.data.table 的参数.

筛选出具有所选功能的一组团队

过滤那些team_no来自同一个Team的玩家不超过三名...

my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no

V1是分配给构造变量的默认名称 max(table(Team)) <= 3 .这不是快如闪电,但现在您已经排除了一些团队,后面的子集步骤应该会更快:

my_new_teams <- 
  FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no

要节省几次击键和微秒,请替换为 (V1)对于 V1==TRUE .这是惯用的方式。

从一组团队中恢复花名册

要获得与每个团队关联的花名册,加入/合并 mxx

mxx[.(team_no = my_new_teams), on="team_no"]

如果您希望球员列在一行中,如 OP 中所示:

mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]

如果您想要每个团队的汇总统计数据,则需要使用 FantasyTeams 加入:

FantasyTeams[.(team_no = my_new_teams), .(
  roster     = toString(Player),
  tot_salary = sum(Salary),
  tot_points = sum(Points)
), on="team_no", by=.EACHI]

#        team_no                                                              roster tot_salary tot_points
#     1:    3716      ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze       9913    149.018
#     2:    3720       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT       9983    146.168
#     3:    3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224       9930    144.417
#     4:    3725       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT       9950    145.173
#     5:    3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224       9897    143.422
#    ---                                                                                                  
# 40202:  125663         EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224       8638    117.032
# 40203:  125664                EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT       8925    119.970
# 40204:  125665          EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224       8872    118.219
# 40205:  125666              EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224       8613    117.317
# 40206:  125667             EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224       8302    108.130

了解一下by=.EACHI正在做,需要一点背景知识。这里的合并语法是 DT[i, j, on=cols, by=.EACHI] .

  • 如果jby被排除在外,它只是进行合并,就像在 FantasyTeams 的构造中一样.
  • 如果by被排除在外,但是 j包括在内,j在合并之后计算。
  • 如果by=.EACHI , 然后 ji 中的每个值单独计算.

关于r - 我想根据同一数据框中其他列的条件从 R 数据框中的列生成 8 种名称组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32855755/

相关文章:

r - 将列添加到 R 中的数据框

dataframe - beautifulsoup 如何故意添加如果未找到元素则返回 none

r - 列出全局环境中的数据框名称

c - 存储和调用动态数组中的值时出错 : program output not as expected

r - 如何自动找到多个变量的偏相关

java - 不重复的组合

R:删除数值向量中的 NA

r - 如何根据分组向量中定义的组对列表元素进行分组?

r - R中的嵌套布局

数据帧上的游程长度编码