python pandas groupby和过滤函数删除候选人姓名，无需两次预测

问题是使用 Pandas，删除任何无效名称和任何没有两个预测的候选名称。在数据框中，某些候选名称在两个预测日期中出现两次，或者某些候选名称仅出现一次。所以我想放弃那些只有一个预测日期的候选人。

我正在尝试使用 groupby 和过滤函数来删除不满足两个条件的候选人姓名: ('forecast_date'== '2018-08-11') AND ('forecast_date'=='2018- 11-06')

这是我的代码:

election_sub=election_sub.dropna(subset=['candidate'])
election_sub.groupby('candidate')
grouped.filter(lambda x: (x['forecast_date']== '2018-08-11')&(x['forecast_date']=='2018-11-06'))

这是数据框:

最佳答案

用途:

#data to DataFrame
url = 'https://raw.githubusercontent.com/fivethirtyeight/checking-our-work-data/master/us_house_elections.csv'
election_sub = pd.read_csv(url, parse_dates=['election_date','forecast_date'])

#filter out `NaN`s
election_sub=election_sub.dropna(subset=['candidate'])

#filter rows for match one OR another datetime
df = election_sub[election_sub['forecast_date'].isin(['2018-08-11','2018-11-06'])].copy()
#get number of unique datetimes per groups
s = df.groupby('candidate')['forecast_date'].nunique()
#filter candidates only with both datetimes, like condition AND
cand = s.index[s.eq(2)].unique()
print (cand)

Index(['A. Donald McEachin', 'Aaron Andrus', 'Aaron Swisher',
       'Abby Finkenauer', 'Abigail Spanberger', 'Adam B. Schiff',
       'Adam Kinzinger', 'Adam Smith', 'Adrian Smith', 'Adriano Espaillat',
       ...
       'William Lacy Clay', 'William Tanoos', 'William Timmons',
       'Willie Billups', 'Xochitl Torres Small', 'Young Kim', 'Yvette Clarke',
       'Yvette Herrell', 'Yvonne Hayes Hinson', 'Zoe Lofgren'],
      dtype='object', name='candidate', length=960)

#filter original data by candidates
df = election_sub[election_sub['candidate'].isin(cand)]

如果两个条件至少有一个条件为 True，则您的解决方案是可能的 - 输出是 2 个标量，因此对于 AND 使用 and:

grouped = election_sub.groupby('candidate')
df = grouped.filter(lambda x: (x['forecast_date']== '2018-08-11').any() and (x['forecast_date']=='2018-11-06').any())

print(df)
        year office state  district special election_date forecast_date  \
0       2018  House    WY       1.0   False    2018-11-06    2018-11-06   
1       2018  House    WY       1.0   False    2018-11-06    2018-11-06   
2       2018  House    WY       1.0   False    2018-11-06    2018-11-06   
3       2018  House    WY       1.0   False    2018-11-06    2018-11-06   
4       2018  House    WY       1.0   False    2018-11-06    2018-11-06   
...      ...    ...   ...       ...     ...           ...           ...   
282688  2018  House    AK       1.0   False    2018-11-06    2018-08-01   
282689  2018  House    AK       1.0   False    2018-11-06    2018-08-01   
282690  2018  House    AK       1.0   False    2018-11-06    2018-08-01   
282691  2018  House    AK       1.0   False    2018-11-06    2018-08-01   
282692  2018  House    AK       1.0   False    2018-11-06    2018-08-01   

       forecast_type party        candidate  projected_voteshare  \
0               lite     D      Greg Hunter             33.29836   
1               lite     R       Liz Cheney             61.18835   
2             deluxe     D      Greg Hunter             31.37998   
3             deluxe     R       Liz Cheney             63.10673   
4            classic     D      Greg Hunter             31.33293   
...              ...   ...              ...                  ...   
282688          lite     R        Don Young             50.74973   
282689        deluxe     D  Alyse S. Galvin             41.49152   
282690        deluxe     R        Don Young             51.96705   
282691       classic     D  Alyse S. Galvin             44.10701   
282692       classic     R        Don Young             49.35155   

        actual_voteshare  probwin  probwin_outcome  
0                    NaN  0.00134                0  
1                    NaN  0.99866                1  
2                    NaN  0.00020                0  
3                    NaN  0.99980                1  
4                    NaN  0.00032                0  
...                  ...      ...              ...  
282688               NaN  0.76900                1  
282689               NaN  0.12776                0  
282690               NaN  0.87224                1  
282691               NaN  0.28146                0  
282692               NaN  0.71854                1  

[282240 rows x 14 columns]

编辑:

两种解决方案的性能不同:

In [41]: %%timeit
    ...: df = election_sub[election_sub['forecast_date'].isin(['2018-08-11','2018-11-06'])].copy()
    ...: #get number of unique datetimes per groups
    ...: s = df.groupby('candidate')['forecast_date'].nunique()
    ...: #filter candidates only with both datetimes, like condition AND
    ...: cand = s.index[s.eq(2)].unique()
    ...: 
    ...: #filter original data by candidates
    ...: df = election_sub[election_sub['candidate'].isin(cand)]
    ...: 
61.3 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [42]: %%timeit
    ...: grouped = election_sub.groupby('candidate')
    ...: df = grouped.filter(lambda x: (x['forecast_date']== '2018-08-11').any() and (x['forecast_date']=='2018-11-06').any())
    ...: 
1.07 s ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

关于python pandas groupby和过滤函数删除候选人姓名，无需两次预测，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61152417/

python pandas groupby和过滤函数删除候选人姓名，无需两次预测

上一篇：java - 显示方法

下一篇：java - 如何仅使用两个变量找到两点之间的距离，然后存储所有点并获得形状？