问题:
我想计算独特的 5 人组合的数量 n
,对于使用以下数据的每个团队,满足下述标准。
数据:
TEAM <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B")
PLAYER <- c("Will","Will","Roy","Roy","Jaylon","Dean","Yosef","Devan","Quincy","Quincy","Luis","Xzavier","Seth","Layne","Layne","Antwan")
LP <- c(1,1,2,2,3,4,5,6,1,1,2,3,4,5,5,6)
POS <- c("3B","OF","1B","OF","SS","OF","C","OF","2B","OF","OF","C","3B","1B","OF","SS")
df <- data.frame(TEAM,PLAYER,LP,POS)
df:
TEAM PLAYER LP POS
1 A Will 1 3B
2 A Will 1 OF
3 A Roy 2 1B
4 A Roy 3 OF
5 A Jaylon 3 SS
6 A Dean 4 OF
7 A Yosef 5 C
8 A Devan 6 OF
9 B Quincy 1 2B
10 B Quincy 1 OF
11 B Luis 2 OF
12 B Xzavier 3 C
13 B Seth 4 3B
14 B Layne 5 1B
15 B Layne 5 OF
16 B Antwan 6 SS
编辑:LP
列与输出无关。这并不像我希望在原始帖子中那样清晰。
标准:
- 五位独特的球员
PLAYER
必须使用(总是会留下一名球员,因为池中有六名球员可供每支球队使用)。 - 每个位置
POS
只能使用一次,但OF
除外,最多可使用三次OF <= 3
. - 组合不得使用玩家
PLAYER
来自多个团队TEAM
.
例如:
这些只是我希望创建/计数的众多可能组合中的一些:
TEAM 1 2 3 4 5
1 A Will-OF Roy-1B Jaylon-SS Dean-OF Devan-OF
2 A Roy-OF Jaylon-SS Dean-OF Yosef-C Devan-OF
3 A Will-3B Roy-OF Jaylon-SS Dean-OF Yosef-C
...
n A Will-3B Roy-1B Jaylon-SS Dean-OF Yosef-C
TEAM 1 2 3 4 5
1 B Quincy-2B Luis-OF Xzavier-C Seth-3B Layne-1B
2 B Quincy-2B Luis-OF Seth-3B Layne-1B Antwan-SS
3 B Quincy-OF Luis-OF Xzavier-C Seth-3B Layne-OF
...
n B Quincy-2B Luis-OF Xzavier-C Seth-3B Layne-OF
期望的结果:
TEAM UNIQUE
A n
B n
我尝试过的:
我知道如何为每支球队获取所有可能的 5 名球员组合并进行总结。我只是不确定如何使用为其职位定义的特定标准来获得我正在寻找的组合。
我希望我知道从哪里开始。我真的需要你的帮助。谢谢!
最佳答案
考虑几个争论步骤:
- 将新列指定为
PLAYER
和POS
的串联。 - 运行
by
,按团队拆分数据帧并对拆分运行操作(规则 #3)。 - 在
PLAYER_POS
上运行combn
以选择 5 个列表。 - 运行
ave
以获取类似PLAYER
的运行计数。 - 运行
过滤器
以保留 5 行、5 个唯一玩家的数据帧,并遵守位置标准(规则 #1 和规则 #2)。
基础 R 代码
# HELPER COLUMN
df$PLAYER_POS <- with(df, paste(PLAYER, POS, sep="_"))
# BUILD LIST OF DFs BY TEAM
df_list <- by(df, df$TEAM, function(sub){
combn(sub$PLAYER_POS, 5, FUN = function(p)
transform(subset(sub, PLAYER_POS %in% p),
PLAYER_NUM = ave(LP, PLAYER, FUN=seq_along)),
simplify = FALSE)
})
# FILTER LIST OF DFs BY TEAM
df_list <- lapply(df_list, function(dfs)
Filter(function(df)
nrow(df) == 5 &
max(df$PLAYER_NUM)==1 &
length(df$POS[df$POS == "OF"]) <= 3 &
length(df$POS[df$POS != "OF"]) == length(unique(df$POS[df$POS != "OF"])),
dfs)
)
# COUNT REMAINING DFs BY TEAM FOR UNIQUE n
lengths(df_list)
# A B
# 18 20
data.frame(TEAMS=names(df_list), UNIQUE=lengths(df_list), row.names=NULL)
# TEAMS UNIQUE
# 1 A 18
# 2 B 20
输出 (子集数据框列表)
df_list$A[[1]]
# TEAM PLAYER LP POS PLAYER_POS PLAYER_NUM
# 1 A Will 1 3B Will_3B 1
# 3 A Roy 2 1B Roy_1B 1
# 5 A Jaylon 3 SS Jaylon_SS 1
# 6 A Dean 4 OF Dean_OF 1
# 7 A Yosef 5 C Yosef_C 1
df_list$A[[2]]
df_list$A[[3]]
...
df_list$A[[18]]
df_list$B[[1]]
# TEAM PLAYER LP POS PLAYER_POS PLAYER_NUM
# 9 B Quincy 1 2B Quincy_2B 1
# 11 B Luis 2 OF Luis_OF 1
# 12 B Xzavier 3 C Xzavier_C 1
# 13 B Seth 4 3B Seth_3B 1
# 14 B Layne 5 1B Layne_1B 1
df_list$B[[2]]
df_list$B[[3]]
...
df_list$B[[20]]
关于r - 计算满足特定标准的独特组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63200801/