我正在尝试向现有的 pandas 数据框添加两个新列。我已经使用带有多个 if else 语句的 python 函数实现了它。但我认为这不是最好的方法,如果我可以使用字典或其他方法来实现相同的目标?
我正在使用以下代码添加新列:
import pandas as pd
df = pd.DataFrame( {"col_1": [1234567, 45677890, 673214, 6709,98765,'',876543]} )
def func(col_1):
col_1=str(col_1)
if col_1=="":
return "NA",""
elif col_1[0:3]=='123':
return "some_text_1 "," other_text_1"
elif col_1[0:3]=='456':
return "some_text_2 ","other_text_2"
elif col_1[0:2]=='67':
return "some_text_3 ","other_text_3"
elif col_1[0:1]=='9':
return "some_text_4 ","other_text_4"
else:
return "Other","Other"
df["col_2"],df["col_3"]=zip(*df["col_1"].map(func))
print(df)
col_1 col_2 col_3
0 1234567 some_text_1 other_text_1
1 45677890 some_text_2 other_text_2
2 673214 some_text_3 other_text_3
3 6709 some_text_3 other_text_3
4 98765 some_text_4 other_text_4
5 NA
6 876543 Other Other
所以我想在这里找到什么,因为我有多个 if 和 else 语句,什么是实现相同目标的最佳方法。我应该使用字典还是任何其他方法,任何指针将不胜感激。
最佳答案
您的方法可能很慢,因为它没有矢量化。这是另一种方法:
temp = df['col_1'].astype(str)
df = df.assign(col_2='Other', col_3='Other')
df.loc[temp.str[0] == '9', ['col_2', 'col_3']] = ('some_text_4 ', 'other_text_4')
df.loc[temp.str[0:2] == '67', ['col_2', 'col_3']] = ('some_text_3 ', 'other_text_3')
df.loc[temp.str[0:3] == '456', ['col_2', 'col_3']] = ('some_text_2 ', 'other_text_2')
df.loc[temp.str[0:3] == '123', ['col_2', 'col_3']] = ('some_text_1 ', 'other_text_1')
df.loc[temp == "", ['col_2', 'col_3']] = ("NA", "")
>>> df
col_1 col_2 col_3
0 1234567 some_text_1 other_text_1
1 45677890 some_text_2 other_text_2
2 673214 some_text_3 other_text_3
3 6709 some_text_3 other_text_3
4 98765 some_text_4 other_text_4
5 NA
6 876543 Other Other
这个想法是,您正在颠倒 if/else 语句的顺序,以便先执行最不重要的语句。后续规则优先,并且可以覆盖其上方的规则。
关于python - 寻找使用字典将动态列添加到 pandas df 的有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45821664/