python - 分组时应用自定义函数返回 NaN

标签 python python-3.x pandas aggregate series

给定一个字典,性能,存储系列类型:

2015-02-28           NaN
2015-03-02    100.000000
2015-03-03     98.997117
2015-03-04     98.909215
2015-03-05     99.909979
2015-03-06    100.161486
2015-03-09    100.502772
2015-03-10    101.685314
2015-03-11    102.518433
2015-03-12    102.427237
2015-03-13    103.424257
2015-03-16    102.669184
2015-03-17    102.181841
2015-03-18    102.436339
2015-03-19    102.672482
2015-03-20    102.238386
2015-03-23    101.460082
...

我想按月对它们进行分组,但只为每个月的数据集选择第一个不是 np.nan 的值:

for perf in performance:
    performance[perf] = performance[perf].groupby(performance[perf].index.month).apply(return_first)


def return_first(array_like):
    # Return data from 1st of month, or first value that is not np.nan
    for i in range(len(array_like)):
        if np.isnan(array_like[i]):
            continue
        else:
            return(array_like[i])

然而,这会返回 nan 值:

2015-02-28   NaN
2015-03-02   NaN
2015-03-03   NaN
2015-03-04   NaN
2015-03-05   NaN
2015-03-06   NaN
2015-03-09   NaN
2015-03-10   NaN
2015-03-11   NaN
2015-03-12   NaN
2015-03-13   NaN
2015-03-16   NaN
2015-03-17   NaN
2015-03-18   NaN
2015-03-19   NaN
2015-03-20   NaN
2015-03-23   NaN
...

当它应该是:

2015-03-02   100   
...

我无法怀疑我的索引,这似乎是一个非常好的pd.DateTimeIndex:

DatetimeIndex(['2015-02-28', '2015-03-02', '2015-03-03', '2015-03-04',
           '2015-03-05', '2015-03-06', '2015-03-09', '2015-03-10',
           '2015-03-11', '2015-03-12',
           ...
           '2016-02-16', '2016-02-17', '2016-02-18', '2016-02-19',
           '2016-02-22', '2016-02-23', '2016-02-24', '2016-02-25',
           '2016-02-26', '2016-02-29'],
          dtype='datetime64[ns]', length=265, freq=None)

我哪里出错了?

最佳答案

如果每个月至少有一个非 NaN 值,请使用 first_valid_index :

print (df.b.groupby(df.index.month).apply(lambda x: x[x.first_valid_index()]))

更通用的解决方案,如果某个月份的所有值都是 NaN,则返回 NaN:

def f(x):
    if x.first_valid_index() is None:
        return np.nan
    else:
        return x[x.first_valid_index()]

print (df.b.groupby(df.index.month).apply(f))

2      NaN
3    100.0
Name: b, dtype: float64

如果您想按分组,请使用 to_period :

print (df.b.groupby(df.index.to_period('M')).apply(f))
2015-02      NaN
2015-03    100.0
Freq: M, Name: b, dtype: float64

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'b': pd.Series({ pd.Timestamp('2015-07-19 00:00:00'): 102.67248199999999,  pd.Timestamp('2015-04-05 00:00:00'):  np.nan,  pd.Timestamp('2015-02-25 00:00:00'):  np.nan,  pd.Timestamp('2015-04-09 00:00:00'): 100.50277199999999,  pd.Timestamp('2015-06-18 00:00:00'): 102.436339,  pd.Timestamp('2015-06-16 00:00:00'): 102.669184,  pd.Timestamp('2015-04-10 00:00:00'): 101.68531400000001,  pd.Timestamp('2015-05-12 00:00:00'): 102.42723700000001,  pd.Timestamp('2015-07-20 00:00:00'): 102.23838600000001,  pd.Timestamp('2015-06-17 00:00:00'):  np.nan,  pd.Timestamp('2015-08-23 00:00:00'): 101.460082,  pd.Timestamp('2015-03-03 00:00:00'): 98.997117000000003,  pd.Timestamp('2015-03-02 00:00:00'): 100.0,  pd.Timestamp('2015-05-11 00:00:00'): 102.518433,  pd.Timestamp('2015-03-04 00:00:00'): 98.909215000000003, pd.Timestamp('2015-05-13 00:00:00'): 103.424257,  pd.Timestamp('2015-04-06 00:00:00'):  np.nan})})
print (df)

                     b
2015-02-25         NaN
2015-03-02  100.000000
2015-03-03   98.997117
2015-03-04   98.909215
2015-04-05         NaN
2015-04-06         NaN
2015-04-09  100.502772
2015-04-10  101.685314
2015-05-11  102.518433
2015-05-12  102.427237
2015-05-13  103.424257
2015-06-16  102.669184
2015-06-17         NaN
2015-06-18  102.436339
2015-07-19  102.672482
2015-07-20  102.238386
2015-08-23  101.460082
def f(x):
    if x.first_valid_index() is None:
        return np.nan
    else:
        return x[x.first_valid_index()]

print (df.b.groupby(df.index.to_period('M')).apply(f))
2015-02           NaN
2015-03    100.000000
2015-04    100.502772
2015-05    102.518433
2015-06    102.669184
2015-07    102.672482
2015-08    101.460082
Freq: M, Name: b, dtype: float64

关于python - 分组时应用自定义函数返回 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37456532/

相关文章:

python - 计算递归中的迭代次数

python - 在迭代python时从列表中删除与另一个列表相比的项目

python - 计算过程中 cumprod 的裁剪值

python - 从txt读取后如何合并数据帧

python - 在一个巨大的字符串文件中查找一个字符串

python - 如何创建 pandas 数据帧数组,其特定列的值始终大于特定列 - 通过使用 np.random.randint

php - 从 python 执行 PHP 脚本

python - 如何在 Windows cmd 上打印不支持的 unicode 字符,例如 "?"而不是引发异常?

python - 在 asyncio(和观察者模式)中链接协程

python - Flask 通过 url 变量传递另一个 url