给定以下数据框:
df_test = pd.DataFrame(
[[1, "BURGLARY"], [2, "PETIT LARCENY"], [3, "DANGEROUS DRUGS"], [4, "LOITERING FOR DRUG PURPOSES"], [5, "DANGEROUS WEAPONS"]],
columns = ['id','ofns_desc']
)
我想添加一个新列来简化 ofns_desc
列中的描述。我做了以下事情:
THEFT = ["BURGLARY", "PETIT LARCENY"]
df_test.loc[df_test.ofns_desc.isin(THEFT), 'category'] = "THEFT"
DRUGS = ["DANGEROUS DRUGS", "LOITERING FOR DRUG PURPOSES"]
df_test.loc[df_test.ofns_desc.isin(DRUGS), 'category'] = "DRUGS"
到目前为止,上面的代码有效:
但是当我尝试为 category
列创建一个 "OTHER"
值时,category
列中的每个值都会被覆盖:
ALL_CAT = [THEFT, DRUGS]
df_test.loc[~df_test.ofns_desc.isin(ALL_CAT), 'category'] = "OTHER"
我做错了什么?
最佳答案
问题是您测试了嵌套列表,因此所有值都失败了,您需要通过 +
加入列表而不是像更改一样传递给 []
:
ALL_CAT = [THEFT, DRUGS]
到:
ALL_CAT = THEFT + DRUGS
另一个想法是创建字典和 Series.map
, 最后将缺失值替换为 Series.fillna
:
THEFT = ["BURGLARY", "PETIT LARCENY"]
DRUGS = ["DANGEROUS DRUGS", "LOITERING FOR DRUG PURPOSES"]
d = {"THEFT":THEFT, 'DRUGS':DRUGS}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'BURGLARY': 'THEFT', 'PETIT LARCENY': 'THEFT',
'DANGEROUS DRUGS': 'DRUGS', 'LOITERING FOR DRUG PURPOSES': 'DRUGS'}
df_test['category'] = df_test['ofns_desc'].map(d1).fillna("OTHER")
print (df_test)
id ofns_desc category
0 1 BURGLARY THEFT
1 2 PETIT LARCENY THEFT
2 3 DANGEROUS DRUGS DRUGS
3 4 LOITERING FOR DRUG PURPOSES DRUGS
4 5 DANGEROUS WEAPONS OTHER
关于 python Pandas : How do I create a column given a condition based on another column?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64890256/