python - 如何检查 Pandas 中两个或多个重复值后的数据是否丢失并用以前的值替换丢失的值？

标签 python python-2.7 python-3.x pandas numpy

我正在尝试用以前的值填充缺失值，但前提是以前的值重复？ 示例 DF:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN

对于上述数据帧，将索引 4、5、6 处的 NaN 替换为 1918.0，并将索引 8 处的 NaN 保留为 NaN。

期望的输出 1:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN

而且，如果我能从所有 NaN 值中得到它发生的实例数，那就太好了。 IE;样本 DF 有 4 个 NaN 值，其中 3 个 NaN 值就是这样出现的。

期望的输出 2:

Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3

请告诉我是否有任何方法可以得到它。

谢谢

最佳答案

这是一种 NumPy 处理底层数组数据以提高性能和便利性的方法 -

# Extract array data which being a view lets us modify the original
# dataframe later on just by modifying it
a = df.Columns.values

# Indices of NaN positions that also have repeating values preceding to them
idx = np.flatnonzero(np.r_[False,False,a[1:-1] == a[:-2]] & np.isnan(a))

# Finally assign previous values for all those places
a[idx] = a[idx-1]

关于python - 如何检查 Pandas 中两个或多个重复值后的数据是否丢失并用以前的值替换丢失的值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46452721/

上一篇：python 为什么首选setter？

下一篇：python - 如何计算 Pandas 数据框单元格中的单词总数并将它们添加到新列中？

python - unicode Python 字符串中的字节数

python - 获取类型错误 : 'list' object is not callable when setting index in Pandas Dataframe

python - 用 0 替换空的 numpy 字符串

python - 缩进错误 : unexpected indent after comment

python-2.7 - 使用 macports 安装 scipy 时遇到问题

python - 如何按列中列表的元素索引数据？

python - 名称错误 : name 'file' is not defined

python - 为大量线程类实现线程池

python - 使用 Pandas 的宽到长数据集