我试图找到数字最大的月份(“月份”列)(在 DepDelay 列中)
数据
flightID Month ArrTime ActualElapsedTime DepDelay ArrDelay
BBYYEUVY67527 1 1514.0 58.0 NA 64.0
MUPXAQFN40227 1 37.0 120.0 13 52.0
LQLYUIMN79169 1 916.0 166.0 NA -25.0
KTAMHIFO10843 1 NaN NaN 5 NaN
BOOXJTEY23623 1 NaN NaN 4 NaN
BBYYEUVY67527 2 1514.0 58.0 NA 64.0
MUPXAQFN40227 2 37.0 120.0 NA 52.0
LQLYUIMN79169 2 916.0 166.0 NA -25.0
KTAMHIFO10843 2 NaN NaN 15 NaN
BOOXJTEY23623 2 NaN NaN 4 NaN
我试过了:
data = pd.read_csv('data.csv', sep='\t')
dep_delay = all_data.groupby(["Month"].DepDelay.count().max())
print(dep_delay)
错误:
AttributeError Traceback (most recent call last)
<ipython-input-14-2ea6213009d6> in <module>()
----> 1 dep_delay = all_data.groupby(["Month"].DepDelay.count().max())
2
3 print(dep_delay)
AttributeError: 'list' object has no attribute 'DepDelay'
良好的输出:
Month DepDelay
1 22
最佳答案
您需要 sum
而不是 count
来按组对值求和。下面是使用 GroupBy
+ sum
,然后是 idxmax
的一种方法:
res = df.groupby('Month')['DepDelay'].sum().reset_index()
res = res.loc[[res['DepDelay'].idxmax()]]
print(res)
Month DepDelay
0 1 22.0
或者,您可以分组和排序,然后提取第一行:
res = df.groupby('Month')['DepDelay'].sum()\
.sort_values(ascending=False).head(1)\
.reset_index()
print(res)
Month DepDelay
0 1 22.0
关于python - 找出列中最大的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52703921/