python - 用 np.NaN 替换 pandas 数据框中的缺失值(以字符串形式给出)

标签 python python-3.x pandas dataframe missing-data

我有一个数据框energy，其中某些列缺少值。缺失值由数据框中的字符串 ... 表示。我想用 np.NaN

替换所有这些值

In [3]: import pandas as pd

In [4]: import numpy as np

In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
   ...: , 'ESC', '% Renewable'])

In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]: 
                          Country   ES  ESC  % Renewable
3                  American Samoa  ...  ...     0.641026
86                           Guam  ...  ...     0.000000
150      Northern Mariana Islands  ...  ...     0.000000
210                        Tuvalu  ...  ...     0.000000
217  United States Virgin Islands  ...  ...     0.000000

为了替换这些值，我尝试过:

In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

我不明白这个错误，而且我也没有看到任何其他方法来实现我想要的。有什么想法吗？

最佳答案

我认为你需要:

energy['ES'] = energy.loc[energy['ES'] != "...", 'ES']

另一个解决方案:

energy['ES'] = energy['ES'].mask(energy['ES'] == "...")

或者:

energy['ES'] = energy['ES'].replace({'...': np.nan})

但最好的是 ayhan 评论:

you can pass na_values='...' to pd.read_excel

关于python - 用 np.NaN 替换 pandas 数据框中的缺失值(以字符串形式给出)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41267367/

上一篇：Python函数返回字典？

下一篇：Python Pandas : Convert a date string to milliseconds since epoch and back to date string?

python - 我可以多次匹配正则表达式中的 or 表达式吗？

python - try-except-else 语句的用例

python - 循环浏览页面总是得到相同的结果

python - 用一个键在 Python 中旋转一个单词

python-3.x - 根据条件重复行和交换列

python - WinPython/Spyder IDE 控制台换行问题

python - 使用 python 抓取网站时获取最大页码

linux - 导入错误 : No module named 'tensorrt'

python-3.x - Python Pandas : groupby one column, 只在另外一列聚合，取对应数据