我有一个包含 np.nan (Numpy not-a-number) 值的 Pandas 数据框:
field1
2020-12-24 NaN
2020-12-25 NaN
2020-12-26 1.0
2020-12-27 2.0
2020-12-28 NaN
2020-12-29 1.0
2020-12-30 2.0
(索引是日期时间。)我想获得一个带有开始日期和 np.nan 出现次数的新数据框,即
field1
2020-12-24 2
2020-12-28 1
我试过这个代码:prev = 1
for col_name, el in df.iterrows():
print(el)
if prev != np.nan and el[0] == np.nan:
cnt = 1
if prev == np.nan and el[0] == np.nan:
cnt = cnt + 1
if prev == np.nan and el[0] != np.nan:
print(cnt)
prev = el[0]
但它没有按预期工作,而且我想避免“for”循环,因为我希望它们在更大的数据帧上非常慢。任何帮助,将不胜感激!
最佳答案
您可以通过 Series.notna
测试非缺失值来创建组与 Series.cumsum
然后只过滤 NaN
s 行,然后通过 Series.map
获得计数和 Series.value_counts
并通过 Series.duplicated
过滤第一个重复的行:
m = df['field1'].notna()
s = m.cumsum()[~m]
df1 = s.map(s.value_counts())[~s.duplicated()].to_frame()
print (df1)
field1
2020-12-24 2
2020-12-28 1
关于python - 计算 Pandas 数据框中 np.nan 的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65559423/