我有一个包含 2016 年初选结果的数据集。该数据集包含 8 列:State、state_abbr、county、fips(州和县的组合 id 编号)、party、candidate、votes 和 fraction votes。我想创建一个名为“结果”的新列,表示每个候选人在每个县的“获胜”或“失败”。我使用 dplyr 将数据过滤到 2 位民主党候选人,然后使用此代码添加列:
Democrat$result <- ifelse(Democrat$fraction_votes > .5, "Win","Loss")
这显然不是一个准确的方法,因为获胜者并不总是获得 50% 的选票。我怎样才能让 R 比较每个县的 vote_fraction 或投票总数,并返回“赢”或“输”? apply() 系列、for 循环或编写函数是否是创建新列的最佳方式?
state state_abbreviation county fips party candidate
1 Alabama AL Autauga 1001 Democrat Bernie Sanders
2 Alabama AL Autauga 1001 Democrat Hillary Clinton
3 Alabama AL Baldwin 1003 Democrat Bernie Sanders
4 Alabama AL Baldwin 1003 Democrat Hillary Clinton
5 Alabama AL Barbour 1005 Democrat Bernie Sanders
6 Alabama AL Barbour 1005 Democrat Hillary Clinton
votes fraction_votes
1 544 0.182
2 2387 0.800
3 2694 0.329
4 5290 0.647
5 222 0.078
6 2567 0.906
最佳答案
我会首先使用 dplyr
包中的 summarise
函数来查找任何候选人在给定县获得的最大票数,然后将具有县最大值的列添加到原始数据集,然后计算结果。
# create a sample dataset akin to the question setup
df <- data.frame(abrev = rep("AL", 6), county = c("Autuga", "Autuga", "Baldwin", "Baldwin",
"Barbour", "Barbour"),
party = rep("Democrat", 6),
candidate = rep(c("Bernie", "Hillary"), 3),
fraction_votes = c(0.18, 0.8, 0.32, 0.64, 0.07, 0.9))
# load a dplyr library
library(dplyr)
# calculate what was the maximum ammount of votes candidate received in a given county
# take a df dataset
winners <- df %>%
# group it by a county
group_by(county) %>%
# for each county, calculate maximum of votes
summarise(score = max(fraction_votes))
# join the original dataset and the dataset with county maximumus
# join them by county column
df <- left_join(df, winners, by = c("county"))
# calculate the result column
df$result <- ifelse(df$fraction_votes == df$score, "Win", "Loss")
如果有同名的不同县,你将不得不调整分组和连接部分,但逻辑应该是相同的
关于r - 通过比较 r 中数据框中的现有变量来添加新变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42397538/