我有一个数据框energy
,其中某些列缺少值。缺失值由数据框中的字符串 ...
表示。我想用 np.NaN
In [3]: import pandas as pd
In [4]: import numpy as np
In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
...: , 'ESC', '% Renewable'])
In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]:
Country ES ESC % Renewable
3 American Samoa ... ... 0.641026
86 Guam ... ... 0.000000
150 Northern Mariana Islands ... ... 0.000000
210 Tuvalu ... ... 0.000000
217 United States Virgin Islands ... ... 0.000000
为了替换这些值,我尝试过:
In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/usr/bin/python3
我不明白这个错误,而且我也没有看到任何其他方法来实现我想要的。有什么想法吗?
最佳答案
我认为你需要:
energy['ES'] = energy.loc[energy['ES'] != "...", 'ES']
另一个解决方案:
energy['ES'] = energy['ES'].mask(energy['ES'] == "...")
或者:
energy['ES'] = energy['ES'].replace({'...': np.nan})
但最好的是 ayhan 评论:
you can pass na_values='...' to pd.read_excel
关于python - 用 np.NaN 替换 pandas 数据框中的缺失值(以字符串形式给出),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41267367/