python - 如何删除除日期列之外所有行均为 NaN 的位置?

标签 python pandas csv

我正在尝试从 csv 文件中删除 NaN 值,但我只想删除所有列均为空的行。下面附有我要删除的行的图片。

文件链接:https://filebin.net/ou93iqiinss02l0g

enter image description here

基本上,如果 B、C、D、E、F、G、H 列为 NaN,我会删除整行

我尝试使用下面的代码,但它删除了所有内容

import pandas as pd

df = pd.read_csv("testing.csv")
df = df.dropna(thresh = 7)

最终结果将如下所示

enter image description here

数据

,Open,High,Low,Close,Adj Close,Volume,Singapore
2015-10-01,2795.399902,3104.719971,2765.439941,2998.350098,2998.350098,0.0,
2015-11-01,2976.719971,3043.850098,2843.949951,2855.939941,2855.939941,0.0,
2015-12-01,2862.790039,2911.439941,2793.389893,2882.72998,2882.72998,0.0,
2016-01-01,2889.22998,2890.209961,2529.01001,2629.110107,2629.110107,0.0,
2016-02-01,2637.050049,2684.790039,2528.439941,2666.51001,2666.51001,0.0,
2016-03-01,2666.709961,2906.800049,2654.97998,2840.899902,2840.899902,0.0,
2016-04-01,2820.659912,2964.100098,2783.419922,2838.52002,2838.52002,158708700.0,
2016-05-01,2842.860107,2848.899902,2713.469971,2791.060059,2791.060059,0.0,
2016-06-01,2787.98999,2881.919922,2703.47998,2840.929932,2840.929932,0.0,
2016-07-01,2848.449951,2958.899902,2830.0,2868.689941,2868.689941,0.0,
2016-08-01,2875.590088,2898.27002,2810.8798829999996,2820.590088,2820.590088,0.0,
2016-09-01,2821.929932,2911.840088,2791.3798829999996,2869.469971,2869.469971,0.0,
2016-10-01,2879.850098,2901.72998,2783.330078,2813.8701170000004,2813.8701170000004,0.0,
2016-11-01,2814.080078,2915.419922,2760.969971,2905.169922,2905.169922,0.0,
2016-12-01,2913.649902,2980.77002,2857.909912,2880.76001,2880.76001,0.0,
2017-01-01,2887.0,3065.1298829999996,2869.659912,3046.800049,3046.800049,0.0,
2017-02-01,3045.939941,3138.969971,3030.649902,3096.610107,3096.610107,4018227800.0,
2017-03-01,3106.300049,3188.02002,3104.330078,3175.110107,3175.110107,5462555700.0,
2017-04-01,3180.27002,3189.810059,3113.899902,3175.439941,3175.439941,4292226700.0,
2017-05-01,3183.429932,3275.389893,3183.409912,3210.820068,3210.820068,5080433500.0,
2017-06-01,3214.1201170000004,3270.919922,3196.48999,3226.47998,3226.47998,4414015100.0,
2017-07-01,3228.909912,3354.709961,3196.139893,3329.52002,3329.52002,5085548600.0,
2017-08-01,3321.5,3349.090088,3244.22998,3277.26001,3277.26001,4856835500.0,
2017-09-01,3274.389893,3275.139893,3193.409912,3219.909912,3219.909912,3840282400.0,
2017-10-01,3233.949951,3392.149902,3230.810059,3374.080078,3374.080078,4261116400.0,
2017-11-01,3377.1899409999996,3449.320068,3341.300049,3433.540039,3433.540039,4789747800.0,
2017-12-01,3441.850098,3469.360107,3370.219971,3402.919922,3402.919922,3386126700.0,
2018-01-01,3406.4799799999996,3611.6899409999996,3403.8701170000004,3533.98999,3533.98999,4727173600.0,
2018-02-01,3536.929932,3574.5900880000004,3340.550049,3517.9399409999996,3517.9399409999996,6143735500.0,
2018-03-01,3493.4399409999996,3555.9799799999996,3382.780029,3427.969971,3427.969971,4963081900.0,
2018-04-01,3439.040039,3628.429932,3338.959961,3613.929932,3613.929932,4599803900.0,
2018-05-01,3624.1999509999996,3641.649902,3428.179932,3428.179932,3428.179932,5918362800.0,
2018-06-01,3423.5,3492.3400880000004,3237.77002,3268.699951,3268.699951,5500961400.0,
2018-07-01,3277.429932,3341.419922,3176.26001,3319.850098,3319.850098,5029346600.0,
2018-08-01,3331.050049,3347.97998,3187.830078,3213.47998,3213.47998,5005791600.0,
2018-09-01,3209.969971,3265.01001,3102.72998,3257.050049,3257.050049,4158150600.0,
2018-10-01,3262.429932,3272.8798829999996,2955.679932,3018.800049,3018.800049,5516696000.0,
2018-11-01,3045.679932,3132.419922,3007.310059,3117.610107,3117.610107,4457632700.0,
2018-12-01,3154.219971,3192.8798829999996,3000.449951,3068.76001,3068.76001,3627597800.0,
2019-01-01,3072.98999,3250.27002,2993.419922,3190.169922,3190.169922,4467841200.0,
2019-02-01,3194.219971,3286.080078,3174.0,3212.689941,3212.689941,3786000800.0,
2019-03-01,3210.840088,3251.719971,3156.790039,3212.8798829999996,3212.8798829999996,4128594600.0,
2019-04-01,3229.110107,3415.179932,3227.6201170000004,3400.1999509999996,3400.1999509999996,4447727600.0,
2019-05-01,3389.5200200000004,3397.179932,3110.51001,3117.76001,3117.76001,4319537800.0,
2019-06-01,3111.51001,3336.080078,3104.030029,3321.610107,3321.610107,4160448600.0,
2019-07-01,3339.580078,3386.649902,3299.889893,3300.75,3300.75,4489792100.0,
2019-08-01,3282.790039,3311.26001,3040.159912,3106.52002,3106.52002,5146051500.0,
2019-09-01,3092.25,3216.8701170000004,3074.040039,3119.98999,3119.98999,4116898900.0,
2019-10-01,3130.110107,3235.23999,3068.830078,3229.8798829999996,3229.8798829999996,4402690200.0,
2019-11-01,3227.600098,3285.719971,3182.050049,3193.919922,3193.919922,7055882400.0,
2019-12-01,3198.27002,3239.23999,3144.070068,3222.830078,3222.830078,4536740600.0,
2020-01-01,3230.47998,3283.889893,3144.100098,3153.72998,3153.72998,4951167700.0,
2020-02-01,3131.02002,3233.860107,3008.459961,3011.080078,3011.080078,5320489700.0,
2020-02-21,,,,,,,24.0
2020-02-25,,,,,,,
2020-02-28,,,,,,,22.0
2020-03-01,2988.350098,3047.790039,2208.419922,2481.22998,2481.22998,7767702900.0,
2020-03-02,,,,,,,
2020-03-03,,,,,,,
2020-03-06,,,,,,,23.0
2020-03-10,,,,,,,
2020-03-13,,,,,,,21.0
2020-03-17,,,,,,,
2020-03-20,,,,,,,24.0
2020-03-23,,,,,,,
2020-03-24,,,,,,,
2020-03-27,,,,,,,27.0
2020-03-30,,,,,,,
2020-03-31,,,,,,,
2020-04-01,2468.169922,2671.580078,2380.840088,2624.22998,2624.22998,7238328000.0,
2020-04-03,,,,,,,37.0
2020-04-06,,,,,,,
2020-04-07,,,,,,,
2020-04-10,,,,,,,73.0
2020-04-13,,,,,,,
2020-04-14,,,,,,,
2020-04-17,,,,,,,85.0
2020-04-20,,,,,,,
2020-04-21,,,,,,,
2020-04-24,,,,,,,90.0
2020-04-27,,,,,,,
2020-04-28,,,,,,,
2020-05-01,2555.669922,2611.73999,2489.939941,2510.75,2510.75,7367276100.0,90.0
2020-05-05,,,,,,,
2020-05-15,,,,,,,
2020-05-21,,,,,,,
2020-05-22,,,,,,,92.0
2020-05-25,,,,,,,
2020-05-26,,,,,,,
2020-05-30,,,,,,,
2020-06-01,2519.419922,2839.389893,2516.459961,2589.909912,2589.909912,8396435700.0,
2020-06-05,,,,,,,89.0
2020-06-08,,,,,,,
2020-06-15,,,,,,,
2020-06-16,,,,,,,
2020-06-19,,,,,,,92.0
2020-06-22,,,,,,,
2020-06-25,,,,,,,
2020-07-01,2604.080078,2707.669922,2511.02002,2529.820068,2529.820068,4876221500.0,
2020-07-03,,,,,,,
2020-07-06,,,,,,,
2020-07-07,,,,,,,90.0
2020-07-12,,,,,,,
2020-07-14,,,,,,,
2020-07-20,,,,,,,92.0
2020-07-26,,,,,,,
2020-07-27,,,,,,,
2020-07-31,,,,,,,
2020-08-01,2522.530029,2602.330078,2478.389893,2532.51001,2532.51001,6347053700.0,
2020-08-03,,,,,,,88.0
2020-08-07,,,,,,,
2020-08-10,,,,,,,
2020-08-12,,,,,,,
2020-08-14,,,,,,,90.0
2020-08-17,,,,,,,
2020-08-25,,,,,,,
2020-08-28,,,,,,,90.0
2020-08-31,,,,,,,
2020-09-01,2521.810059,2546.8701170000004,2476.820068,2490.090088,2490.090088,2000718800.0,
2020-09-11,2481.080078,2492.419922,2476.820068,2490.090088,2490.090088,0.0,

最佳答案

  • 使用 pandas.read_csv,并将 parse_datesindex_col 设置为索引 0 处的未命名日期列。
  • .dropnahow='all',这将删除完全为 NaN 的任何行。不考虑索引,这就是日期列设置为索引的原因。
  • 从技术上讲,日期不必解析为日期时间,但这是财务数据,因此它应该采用正确的日期时间格式以进行时间序列分析,并且因为它将正确绘制。日期列必须是以这种方式轻松 .dropna 的索引。
df = pd.read_csv('testing.csv', parse_dates=[0], index_col=0)

# drop na
df = df.dropna(how='all')

# save file
df.to_csv('test_updated.csv', index=True)

# display(df)
                  Open        High         Low       Close   Adj Close       Volume  Singapore
2015-10-01  2795.39990  3104.71997  2765.43994  2998.35010  2998.35010  0.00000e+00        NaN
2015-11-01  2976.71997  3043.85010  2843.94995  2855.93994  2855.93994  0.00000e+00        NaN
2015-12-01  2862.79004  2911.43994  2793.38989  2882.72998  2882.72998  0.00000e+00        NaN
2016-01-01  2889.22998  2890.20996  2529.01001  2629.11011  2629.11011  0.00000e+00        NaN
2016-02-01  2637.05005  2684.79004  2528.43994  2666.51001  2666.51001  0.00000e+00        NaN
2016-03-01  2666.70996  2906.80005  2654.97998  2840.89990  2840.89990  0.00000e+00        NaN
2016-04-01  2820.65991  2964.10010  2783.41992  2838.52002  2838.52002  1.58709e+08        NaN
2016-05-01  2842.86011  2848.89990  2713.46997  2791.06006  2791.06006  0.00000e+00        NaN
2016-06-01  2787.98999  2881.91992  2703.47998  2840.92993  2840.92993  0.00000e+00        NaN
2016-07-01  2848.44995  2958.89990  2830.00000  2868.68994  2868.68994  0.00000e+00        NaN
2016-08-01  2875.59009  2898.27002  2810.87988  2820.59009  2820.59009  0.00000e+00        NaN
2016-09-01  2821.92993  2911.84009  2791.37988  2869.46997  2869.46997  0.00000e+00        NaN
2016-10-01  2879.85010  2901.72998  2783.33008  2813.87012  2813.87012  0.00000e+00        NaN
2016-11-01  2814.08008  2915.41992  2760.96997  2905.16992  2905.16992  0.00000e+00        NaN
2016-12-01  2913.64990  2980.77002  2857.90991  2880.76001  2880.76001  0.00000e+00        NaN
2017-01-01  2887.00000  3065.12988  2869.65991  3046.80005  3046.80005  0.00000e+00        NaN
2017-02-01  3045.93994  3138.96997  3030.64990  3096.61011  3096.61011  4.01823e+09        NaN
2017-03-01  3106.30005  3188.02002  3104.33008  3175.11011  3175.11011  5.46256e+09        NaN
2017-04-01  3180.27002  3189.81006  3113.89990  3175.43994  3175.43994  4.29223e+09        NaN
2017-05-01  3183.42993  3275.38989  3183.40991  3210.82007  3210.82007  5.08043e+09        NaN
2017-06-01  3214.12012  3270.91992  3196.48999  3226.47998  3226.47998  4.41402e+09        NaN
2017-07-01  3228.90991  3354.70996  3196.13989  3329.52002  3329.52002  5.08555e+09        NaN
2017-08-01  3321.50000  3349.09009  3244.22998  3277.26001  3277.26001  4.85684e+09        NaN
2017-09-01  3274.38989  3275.13989  3193.40991  3219.90991  3219.90991  3.84028e+09        NaN
2017-10-01  3233.94995  3392.14990  3230.81006  3374.08008  3374.08008  4.26112e+09        NaN
2017-11-01  3377.18994  3449.32007  3341.30005  3433.54004  3433.54004  4.78975e+09        NaN
2017-12-01  3441.85010  3469.36011  3370.21997  3402.91992  3402.91992  3.38613e+09        NaN
2018-01-01  3406.47998  3611.68994  3403.87012  3533.98999  3533.98999  4.72717e+09        NaN
2018-02-01  3536.92993  3574.59009  3340.55005  3517.93994  3517.93994  6.14374e+09        NaN
2018-03-01  3493.43994  3555.97998  3382.78003  3427.96997  3427.96997  4.96308e+09        NaN
2018-04-01  3439.04004  3628.42993  3338.95996  3613.92993  3613.92993  4.59980e+09        NaN
2018-05-01  3624.19995  3641.64990  3428.17993  3428.17993  3428.17993  5.91836e+09        NaN
2018-06-01  3423.50000  3492.34009  3237.77002  3268.69995  3268.69995  5.50096e+09        NaN
2018-07-01  3277.42993  3341.41992  3176.26001  3319.85010  3319.85010  5.02935e+09        NaN
2018-08-01  3331.05005  3347.97998  3187.83008  3213.47998  3213.47998  5.00579e+09        NaN
2018-09-01  3209.96997  3265.01001  3102.72998  3257.05005  3257.05005  4.15815e+09        NaN
2018-10-01  3262.42993  3272.87988  2955.67993  3018.80005  3018.80005  5.51670e+09        NaN
2018-11-01  3045.67993  3132.41992  3007.31006  3117.61011  3117.61011  4.45763e+09        NaN
2018-12-01  3154.21997  3192.87988  3000.44995  3068.76001  3068.76001  3.62760e+09        NaN
2019-01-01  3072.98999  3250.27002  2993.41992  3190.16992  3190.16992  4.46784e+09        NaN
2019-02-01  3194.21997  3286.08008  3174.00000  3212.68994  3212.68994  3.78600e+09        NaN
2019-03-01  3210.84009  3251.71997  3156.79004  3212.87988  3212.87988  4.12859e+09        NaN
2019-04-01  3229.11011  3415.17993  3227.62012  3400.19995  3400.19995  4.44773e+09        NaN
2019-05-01  3389.52002  3397.17993  3110.51001  3117.76001  3117.76001  4.31954e+09        NaN
2019-06-01  3111.51001  3336.08008  3104.03003  3321.61011  3321.61011  4.16045e+09        NaN
2019-07-01  3339.58008  3386.64990  3299.88989  3300.75000  3300.75000  4.48979e+09        NaN
2019-08-01  3282.79004  3311.26001  3040.15991  3106.52002  3106.52002  5.14605e+09        NaN
2019-09-01  3092.25000  3216.87012  3074.04004  3119.98999  3119.98999  4.11690e+09        NaN
2019-10-01  3130.11011  3235.23999  3068.83008  3229.87988  3229.87988  4.40269e+09        NaN
2019-11-01  3227.60010  3285.71997  3182.05005  3193.91992  3193.91992  7.05588e+09        NaN
2019-12-01  3198.27002  3239.23999  3144.07007  3222.83008  3222.83008  4.53674e+09        NaN
2020-01-01  3230.47998  3283.88989  3144.10010  3153.72998  3153.72998  4.95117e+09        NaN
2020-02-01  3131.02002  3233.86011  3008.45996  3011.08008  3011.08008  5.32049e+09        NaN
2020-02-21         NaN         NaN         NaN         NaN         NaN          NaN       24.0
2020-02-28         NaN         NaN         NaN         NaN         NaN          NaN       22.0
2020-03-01  2988.35010  3047.79004  2208.41992  2481.22998  2481.22998  7.76770e+09        NaN
2020-03-06         NaN         NaN         NaN         NaN         NaN          NaN       23.0
2020-03-13         NaN         NaN         NaN         NaN         NaN          NaN       21.0
2020-03-20         NaN         NaN         NaN         NaN         NaN          NaN       24.0
2020-03-27         NaN         NaN         NaN         NaN         NaN          NaN       27.0
2020-04-01  2468.16992  2671.58008  2380.84009  2624.22998  2624.22998  7.23833e+09        NaN
2020-04-03         NaN         NaN         NaN         NaN         NaN          NaN       37.0
2020-04-10         NaN         NaN         NaN         NaN         NaN          NaN       73.0
2020-04-17         NaN         NaN         NaN         NaN         NaN          NaN       85.0
2020-04-24         NaN         NaN         NaN         NaN         NaN          NaN       90.0
2020-05-01  2555.66992  2611.73999  2489.93994  2510.75000  2510.75000  7.36728e+09       90.0
2020-05-22         NaN         NaN         NaN         NaN         NaN          NaN       92.0
2020-06-01  2519.41992  2839.38989  2516.45996  2589.90991  2589.90991  8.39644e+09        NaN
2020-06-05         NaN         NaN         NaN         NaN         NaN          NaN       89.0
2020-06-19         NaN         NaN         NaN         NaN         NaN          NaN       92.0
2020-07-01  2604.08008  2707.66992  2511.02002  2529.82007  2529.82007  4.87622e+09        NaN
2020-07-07         NaN         NaN         NaN         NaN         NaN          NaN       90.0
2020-07-20         NaN         NaN         NaN         NaN         NaN          NaN       92.0
2020-08-01  2522.53003  2602.33008  2478.38989  2532.51001  2532.51001  6.34705e+09        NaN
2020-08-03         NaN         NaN         NaN         NaN         NaN          NaN       88.0
2020-08-14         NaN         NaN         NaN         NaN         NaN          NaN       90.0
2020-08-28         NaN         NaN         NaN         NaN         NaN          NaN       90.0
2020-09-01  2521.81006  2546.87012  2476.82007  2490.09009  2490.09009  2.00072e+09        NaN
2020-09-11  2481.08008  2492.41992  2476.82007  2490.09009  2490.09009  0.00000e+00        NaN

绘图

  • 该图使用 pandas.DataFrame.plot,它使用 matplotlib 作为默认绘图引擎
    • 请注意,这不是在 NaN 值之间绘制线条,因此添加了 dropna 来进行绘图。
  • 不要用值绘制体积,因为比例(y 值)要大得多。
  • 'Singapore' 是单独绘制的,因为它的值较低且数据点较少,所以作为线图看起来会很有趣。
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(9, 10))

df[['Open', 'High', 'Low', 'Close', 'Adj Close']].dropna().plot(ax=ax1)
ax2.scatter(df.index, 'Singapore', data=df, label='Singapore')
ax2.legend()
plt.show()

enter image description here

关于python - 如何删除除日期列之外所有行均为 NaN 的位置?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63922425/

相关文章:

python - 如何将列表列表转换为每个列表换行且没有逗号的字符串

python - Pandas ,分组并在组中找到最大值,返回值和计数

python - python的〜在使用 boolean 值时发生了什么?

python - 添加文件名作为 CSV 文件的最后一列

python - 在 Python 中旋转(和调整大小)二维坐标列表

python - 将变量从 django 中的 views.py 传递给所有模板

python - 在 Numba 中获取类似结构化数组/数据帧的结构的最佳方法是什么?

python - 如何在多列上实现隐马尔可夫模型?

PHP mysql INSERT导致第一行空白

csv - Logstash将所有内容放入所有Elasticsearch索引中