我有这个数据框:
df = pd.DataFrame({'a' : ('road','road','road','highway','house','house'),
'b' : ('11','23','15','32','17','21')})
给出:
df
a b
0 road 11
1 road 23
2 road 15
3 highway 32
4 house 17
5 house 21
我想创建一个新字段,如果根据 a
是重复的,则新文件将采用 1,否则将采用 0。
这里我过滤掉重复的值:
mask = df['a'].duplicated(keep = False)
df[mask]
给出:
a b
0 road 11
1 road 23
2 road 15
4 house 17
5 house 21
想要的结果:
a b c
0 road 11 1
1 road 23 1
2 road 15 1
3 highway 32 0
4 house 17 1
5 house 21 1
最佳答案
您可以将 df['a'].duplicated(keep = False)
的结果分配给新列,例如:
<b>df['c']</b> = df['a'].duplicated(keep = False)
结果我们得到:
>>> df
a b c
0 road 11 True
1 road 23 True
2 road 15 True
3 highway 32 False
4 house 17 True
5 house 21 True
或者如果你想要整数:
df['c'] = df['a'].duplicated(keep = False)<b>.astype(int)</b>
产生预期的结果:
>>> df
a b c
0 road 11 1
1 road 23 1
2 road 15 1
3 highway 32 0
4 house 17 1
5 house 21 1
关于python - 创建一个新字段,重复项设为 1,非重复项设为 0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53041721/