python - 计算列中的唯一值 - pandas Python

这是我的数据集:

Unique_ID   No_of_Filings   Req_1   Req_2   Req_3   Req_4
 RCONF045   3               Blue    Red     White   Violet
 RCONF046   3               Blue    Red     White   Brown
 RCONF047   3               Blue    Red     White   Brown
 RCONF048   3               Black   Yellow  Green   N/A
 RCONF051   4               Black   Yellow  Green   N/A
 RCONF052   4               Black   Brown   Green   Orange

我通过以下方式从最后 4 列(Req_1 到 Req_4)中提取了唯一值:

pd.unique(df1[["Req_1","Req_2","Req_3","Req_4"]].values.ravel("K"))

Out[20]:  array(['Blue', 'Black', 'Red', 'Yellow', 'Brown', 'White', 'Green',
       'Violet', nan, 'Orange'], dtype=object)

这是我需要的输出。 Frequency = 它在最后四列中出现的次数(例如，黄色仅出现两次)和 Number of Filings = sum(No_of_Filings 如果要求在该行中)。例如，蓝色在前三行，所以是 3 + 3 + 3 = 9，棕色在第二、三、六行，所以是 3 + 3 + 4 = 10

Requirements    Frequency   Number of Filings
   Blue            3              9
   Black           3              11
   Red             3              9
   Brown           3              10
   White           3              9
   Green           3              11
   Yellow          2              7
   N/A             2              7
   Violet          1              3
   Orange          1              4

如何使用 pandas 在上面新建的数据框中创建这两列？

谢谢

最佳答案

您可以使用 agg 沿着这些路线做一些事情，但它需要事先进行一些重新整形。获取方法如下:

agg_df = (df.fillna('N/A').set_index(['Unique_ID', 'No_of_Filings'])
          .stack()
          .reset_index('No_of_Filings')
          .groupby(0)
          .agg(['sum', 'size'])
          .reset_index())

agg_df.columns = ['Requirements', 'Number of Filings', 'Frequency']

>>> agg_df
  Requirements  Number of Filings  Frequency
0        Black                 11          3
1         Blue                  9          3
2        Brown                 10          3
3        Green                 11          3
4          N/A                  7          2
5       Orange                  4          1
6          Red                  9          3
7       Violet                  3          1
8        White                  9          3
9       Yellow                  7          2

关于python - 计算列中的唯一值 - pandas Python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51643505/

python - 计算列中的唯一值 - pandas Python

上一篇：python - 找出句子中所有的小写单词

下一篇：python - 根据目标数据集查找数据集中最接近的数字