python - 当我在 dataframe(pandas) 中设置值时出现错误 : 'Series' objects are mutable, 因此它们无法被散列

标签 python pandas jupyter

我想通过 data[Bare Nuclei'] != '?' 的条件来更改 pandas DataFrame 中的值

import pandas as pd
import numpy as np
column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
mean = 0
n = 0
for index,row in data.iterrows():
    if row['Bare Nuclei'] != '?':
        n += 1
        mean += int(row['Bare Nuclei'])
mean = mean / n
temp = data
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

这是 jupyter 笔记本给我错误: enter image description here

我想知道如何更改数据框中的值以及为什么我的方法是错误的?你能帮助我吗,我期待你的帮助!!

最佳答案

最后一行添加 DataFrame.loc ,因为需要更改 DataFrame 的列:

temp.loc[index,'Bare Nuclei'] = mean
<小时/>

但是在pandas中最好避免循环,因为速度慢。所以更好的解决方案是 replace ?NaN,然后 fillna通过意思:

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

替代解决方案:

mask = data['Bare Nuclei'] == '?'
data['Bare Nuclei'] = data['Bare Nuclei'].mask(mask).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

验证解决方案:

column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
#print (data.head())
<小时/>
#get index values by condition
L = data.index[data['Bare Nuclei'] == '?'].tolist()
print (L)
[23, 40, 139, 145, 158, 164, 235, 249, 275, 292, 294, 297, 315, 321, 411, 617]

#get mean of values converted to numeric
print (data['Bare Nuclei'].replace('?', np.nan).astype(float).mean())
3.5446559297218156

print (data.loc[L, 'Bare Nuclei'])
23     ?
40     ?
139    ?
145    ?
158    ?
164    ?
235    ?
249    ?
275    ?
292    ?
294    ?
297    ?
315    ?
321    ?
411    ?
617    ?
Name: Bare Nuclei, dtype: object

#convert to numeric - replace `?` to NaN and cast to float
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
#replace NaNs by means
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())
<小时/>
#verify replacing
print (data.loc[L, 'Bare Nuclei'])
23     3.544656
40     3.544656
139    3.544656
145    3.544656
158    3.544656
164    3.544656
235    3.544656
249    3.544656
275    3.544656
292    3.544656
294    3.544656
297    3.544656
315    3.544656
321    3.544656
411    3.544656
617    3.544656
Name: Bare Nuclei, dtype: float64

关于python - 当我在 dataframe(pandas) 中设置值时出现错误 : 'Series' objects are mutable, 因此它们无法被散列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49356798/

相关文章:

python - 软件包安装在 anaconda\lib\site-packages 中并在 spy 程序中找到,但不在 jupyter 笔记本中

python - 编程错误 : relation "django_session" does not exist error after installing Psycopg2

python - 如何获取由另一个数据帧的部分转置片段构成的新 df

带参数的 Python 装饰器

python - 如何根据数据框中的单词检测分配点数/分数?

python - 将 Excel 读取到数据帧时出现解析器错误 Pandas

python - 如何获取 Jupyter Notebook 的路径?

python - Pycharm,Jupyter Notebook,从不同的目录导入我自己的源文件

python - 在python中从字典中设置属性

python - 在树莓派3上为Docker安装OpenCV