python - df.iloc 未在 For 循环中赋值？ ( Pandas )

我对 pandas 相当陌生，我有一个包含大约 250,000 行的数据集，存储在 JSON 中。我的一列在每个单元格中包含一个很长的、可能是唯一的字符串，我必须在数据可用之前过滤一些字符串。由于某种原因，每个值都被正确访问和过滤(意味着正确的值最后存储在我的处理变量中)，但是当涉及到 df.iloc[x]['notes'] 赋值时，这些值是未正确重新分配到数据框中。我读过有关 pandas 中链式索引和分配的问题，但我认为可以通过使用 .iloc 来规避这个问题，但它现在对我不起作用。

这是一个例子:

假设这是我的数据框和一些过滤代码:

import pandas as pd 

#Listing the things I want to filter out
greeting = ['Hello,', 'Hi']
goodbye = ['Thank you', 'Goodbye']

df = pd.DataFrame({'ID':[123, 456, 789], 'Group':['A', 'B', 'C'],\
'notes':['Hello, this is John', 'Thank you for your help',\
'This is a message.']})

#Doing the actual filtering
for x in range(0, len(df['notes'])):

    note = df.iloc[x]['notes']

    for y in greeting:
        if y in note:
            note = note.replace(y, '')

    for z in goodbye:
        if z in note:
            note = note.replace(z, '')

#The variable note is correctly filtered here,\
but then it doesn't assign and leaves the df unchanged\
at the previous index, so error is probably beyond this point

    df.iloc[x]['notes'] = note
df.to_json('final_data.json', orient = 'records')

我用来代替 .iloc 的另一件事是 df.at[x, 'notes'] = note，但这似乎有同样的问题。

所以在最终版本中，而不是得到类似的东西:

[{'ID':1, 'Group': "A", 'notes':'这是约翰'}..等等]

我得到:

[{'ID':1, 'Group': "A", 'notes':'你好，这是约翰'}..等等] (完全不变)

这里发生了什么？是否有一些不可预测的任务正在进行，我可以以某种方式修复？

最佳答案

为什么不:

df['notes'] = df['notes'].str.replace('|'.join(greeting + goodbye), '')

现在:

df.to_json('final_data.json', orient = 'records')

将为您提供一个理想的 json 文件。

如:

[{"Group":"A","ID":123,"notes":" this is John"},{"Group":"B","ID":456,"notes":" for your help"},{"Group":"C","ID":789,"notes":"This is a message."}]

关于python - df.iloc 未在 For 循环中赋值？ ( Pandas )，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56535373/

python - df.iloc 未在 For 循环中赋值？ ( Pandas )

上一篇：python - 为什么主进程和子进程在主进程异常后不退出？

下一篇：python - 尝试理解python中reduce的功能