我的 Excel 中有以下数据:
SCENARIO DATE POD AREA IDOC STATUS TYPE
AAA 02.06.2015 JKJKJKJKJKK 4210 713375 51 1
AAA 02.06.2015 JWERWERE 4210 713375 51 1
AAA 02.06.2015 JAFDFDFDFD 4210 713375 51 9
BBB 02.06.2015 AAAAAAAA 5400 713504 51 43
CCC 05.06.2015 BBBBBBBBBB 4100 756443 51 187
AAA 05.06.2015 EEEEEEEE 4100 756457 53 228
我想实现以下目标:
SCENARIO STATUS TYPE COUNT(TYPE)
AAA 51 1 2
9 1
53 228 1
BBB 51 43 1
CCC 51 187 1
我尝试了以下方法,但它对每一列进行聚合,而且类型显示为浮点,即:
SCENARIO STATUS TYPE
E01 51 1.0 23 23 23 23 23 23 23 23 2
4.0 89 89 89 89 89 89 89 89 8
13.0 21 21 21 21 21 21 21 21 2
20.0 57 57 57 57 57 57 57 57 5
29.0 5 5 5 5 5 5 5 5
我希望这里只显示一个“计数”,一列。 这是我尝试过的代码:
xl = pd.ExcelFile("MRD.xlsx")
df = xl.parse("Sheet3")
print (df.column.values)
# The following gave ValueError: Cannot label index with a null key
# dfi = df.pivot('SCENARIO)
# Here i do not actually need it to count every column, just a specific one
table = df.groupby(["SCENARIO", "STATUS", "TYPE"]).agg(['count']
writer = pd.ExcelWriter('pandas.out.xlsx', engine='xlsxwriter')
table.to_excel(writer, sheet_name='Sheet1')
writer.save()
提前致谢!
最佳答案
使用GroupBy.count
如果不需要需要计数NaN
s,则在[]
中指定列:
table = df.groupby(["SCENARIO", "STATUS", "TYPE"])['TYPE'].count()
print (table)
SCENARIO STATUS TYPE
AAA 51 1 2
9 1
53 228 1
BBB 51 43 1
CCC 51 187 1
Name: TYPE, dtype: int64
或者使用GroupBy.size
,列指定不是必需的,但区别在于它还计数 NaN
:
table = df.groupby(["SCENARIO", "STATUS", "TYPE"]).size()
print (table)
SCENARIO STATUS TYPE
AAA 51 1 2
9 1
53 228 1
BBB 51 43 1
CCC 51 187 1
dtype: int64
如果需要来自MultiIndex Series
的列:
table = (df.groupby(["SCENARIO", "STATUS", "TYPE"])['TYPE']
.count()
.reset_index(name='COUNT(TYPE)'))
print (table)
SCENARIO STATUS TYPE COUNT(TYPE)
0 AAA 51 1 2
1 AAA 51 9 1
2 AAA 53 228 1
3 BBB 51 43 1
4 CCC 51 187 1
table = (df.groupby(["SCENARIO", "STATUS", "TYPE"])
.size()
.reset_index(name='COUNT(TYPE)') )
print (table)
SCENARIO STATUS TYPE COUNT(TYPE)
0 AAA 51 1 2
1 AAA 51 9 1
2 AAA 53 228 1
3 BBB 51 43 1
4 CCC 51 187 1
最后,如果不需要将名为索引的第一列写入excel
:
table.to_excel(writer, sheet_name='Sheet1', index=False)
关于python - Pandas ,值错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47248069/