r - 通过比较 r 中数据框中的现有变量来添加新变量

我有一个包含 2016 年初选结果的数据集。该数据集包含 8 列:State、state_abbr、county、fips(州和县的组合 id 编号)、party、candidate、votes 和 fraction votes。我想创建一个名为“结果”的新列，表示每个候选人在每个县的“获胜”或“失败”。我使用 dplyr 将数据过滤到 2 位民主党候选人，然后使用此代码添加列:

 Democrat$result <- ifelse(Democrat$fraction_votes > .5, "Win","Loss")

这显然不是一个准确的方法，因为获胜者并不总是获得 50% 的选票。我怎样才能让 R 比较每个县的 vote_fraction 或投票总数，并返回“赢”或“输”？ apply() 系列、for 循环或编写函数是否是创建新列的最佳方式？

  state state_abbreviation  county fips    party       candidate
   1    Alabama  AL         Autauga 1001 Democrat  Bernie Sanders
   2    Alabama  AL         Autauga 1001 Democrat Hillary Clinton
   3 Alabama    AL          Baldwin 1003 Democrat  Bernie Sanders
   4 Alabama   AL           Baldwin 1003 Democrat Hillary Clinton
   5 Alabama   AL           Barbour 1005 Democrat  Bernie Sanders
   6 Alabama   AL            Barbour 1005 Democrat Hillary Clinton
    votes fraction_votes
    1   544          0.182
    2  2387          0.800
     3  2694          0.329
     4  5290          0.647
    5   222          0.078
    6  2567          0.906

最佳答案

我会首先使用 dplyr 包中的 summarise 函数来查找任何候选人在给定县获得的最大票数，然后将具有县最大值的列添加到原始数据集，然后计算结果。

# create a sample dataset akin to the question setup
df <- data.frame(abrev = rep("AL", 6), county = c("Autuga", "Autuga", "Baldwin", "Baldwin",
                                                  "Barbour", "Barbour"),
                 party = rep("Democrat", 6), 
                 candidate = rep(c("Bernie", "Hillary"), 3),
                 fraction_votes = c(0.18, 0.8, 0.32, 0.64, 0.07, 0.9))

# load a dplyr library
library(dplyr)

# calculate what was the maximum ammount of votes candidate received in a given county

# take a df dataset
winners <- df %>%
        # group it by a county
        group_by(county) %>%
        # for each county, calculate maximum of votes
        summarise(score = max(fraction_votes))

# join the original dataset and the dataset with county maximumus
# join them by county column
df <- left_join(df, winners, by = c("county"))

# calculate the result column
df$result <- ifelse(df$fraction_votes == df$score, "Win", "Loss")

如果有同名的不同县，你将不得不调整分组和连接部分，但逻辑应该是相同的

关于r - 通过比较 r 中数据框中的现有变量来添加新变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42397538/

r - 通过比较 r 中数据框中的现有变量来添加新变量

上一篇：r - 如何更改函数内数据框列表中的列名？

下一篇：Perl Net::SSH2 pty 模式和 EOF