python - 在取其他列表的平均值时不能删除带有空列表的行

我有一个包含 2 列的时间序列 df。我试图从 yearly_cost 列中删除所有空列表，同时取包含 float 的列表的平均值来为每一天创建一个奇异值。 date 列中的同一日期有多个值，因此我试图根据日期合并所有行。 df 看起来是这样的:

    date        yearly_cost
0   2009-01-01  []
1   2009-01-02  [409.45,294.33,394.56]
2   2009-01-03  [403.45,175.30,323.67]
3   2009-01-01  [456.34,355.3,493.5]
4   2009-01-02  []
5   2009-01-03  [295.39, 439.23]

有些日子会有多个列表，因此我需要对两个列表取平均值以创建单个值。

我尝试使用 .dropna()、np.nanmean() 并使用 ts.yearly_cost = [np.mean(i ) if isinstance(i, list) else i for i in ts.yearly_cost] 按日期连接使用 .set_index('date').mean(axis=1).reset_index(name='Yearly_Cost') 它在过去没有空列表的情况下适用于时间序列。

我希望最终结果看起来像这样:

date        yearly_cost
0   2009-01-01  435.05
1   2009-01-02  366.11
2   2009-01-03  327.408

如有任何帮助，我们将不胜感激。谢谢

最佳答案

如果 yearly_cost 列中有列表，首先将它们展平，然后聚合 mean:

import ast
#necessary if string repr of lists
#df['yearly_cost'] = df['yearly_cost'].apply(ast.literal_eval)

from itertools import chain

df = pd.DataFrame({
    'yearly_cost' : list(chain.from_iterable(df['yearly_cost'].tolist())), 
    'date' : df['date'].values.repeat(df['yearly_cost'].str.len())
})

df = df.groupby('date', as_index=False)['yearly_cost'].mean()
print (df)
         date  yearly_cost
0  2009-01-01   435.046667
1  2009-01-02   366.113333
2  2009-01-03   327.408000

另一种解决方案:

s = pd.DataFrame(df['yearly_cost'].values.tolist(), index=df['date']).stack()
df = s.mean(level=0).reset_index(name='yearly_cost')
print (df)
         date  yearly_cost
0  2009-01-02   366.113333
1  2009-01-03   327.408000
2  2009-01-01   435.046667

关于python - 在取其他列表的平均值时不能删除带有空列表的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56294008/

python - 在取其他列表的平均值时不能删除带有空列表的行

上一篇：python - 截断一个数字并获取删除的值。还是转换为 int？

下一篇：python - 如何使用 Pandas 生成达到给定限制的零填充数字序列？