python - 按字典重新分配 pandas 列对原始 DataFrame 没有影响？

我有一个巨大的pandas DataFrame，看起来像这样(示例):

df = pd.DataFrame({"col1":{0:"There ARE NO ERRORS!!!", 1:"EVERYTHING is failing", 2:"There ARE NO ERRORS!!!"}, "col2":{0:"WE HAVE SOME ERRORS", 1:"EVERYTHING is failing", 2:"System shutdown!"}})

我有一个名为 cleanMessage 的函数，它会去除标点符号并返回小写字符串。例如，cleanMessage(“可能有一些错误，我不知道!!”) 将返回可能有一些我不知道的错误。

我正在尝试将 col1 中的每条消息替换为该特定消息返回的任何 cleanMessage (基本上清理这些消息列)。 pd.DataFrame.iterrows 对我来说工作正常，但有点慢。我试图将新值基本上映射到原始 df 中的键，如下所示:

message_set = set(df["col1"])
message_dict = dict((original, cleanMessage(original)) for original in message_set)
df = df.replace("col1", message_dict)

所以，原来的df会喜欢:

>>> df
    col1                      col2
0   "There ARE NO ERRORS"     "WE HAVE SOME ERRORS"
1   "EVERYTHING is failing"   "EVERYTHING is failing"
2   "There ARE NO ERRORS!!!"  "System shutdown!"

“之后”df 应该如下所示:

>>> df
    col1                      col2
0   "there are no errors"     "WE HAVE SOME ERRORS"
1   "everything is failing"   "EVERYTHING is failing"
2   "there are no errors"     "System shutdown!"

我的代码的 replace 部分是否遗漏了某些内容？

编辑:

对于 future 的观众，这是我需要使用的代码:

df["col1"] = df["col1"].map(message_dict)

最佳答案

replace 与 regex 配合良好 - 考虑将 clean message() 的逻辑放入嵌套的 replace().

df["col2"] = df["col1"].replace(...).replace(...)

关于python - 按字典重新分配 pandas 列对原始 DataFrame 没有影响？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38490479/

python - 按字典重新分配 pandas 列对原始 DataFrame 没有影响？

上一篇：python - 向对象方法传递太多参数 - Python

下一篇：python - 读取多字符键盘笔划