除非满足某些条件,否则我想使用参数“take_last = True”删除数据帧中与列“a”相关的重复行。例如,如果我有以下数据帧
a | b | c
1 | S | Blue
2 | M | Black
2 | L | Blue
1 | L | Green
我想删除与“a”列相关的重复行,一般规则为 take_last = true ,除非某些条件说 c = 'Blue',在这种情况下,我想让参数 take_last = false。
这样我就可以得到它作为结果 df
a | b | c
1 | L | Green
2 | M | Black
最佳答案
# a b c
#0 1 S Blue
#1 2 M Black
#2 2 L Blue
#3 1 L Green
#get first rows of groups, sort them and reset index; delete redundant col index
df1 = df.groupby('a').head(1).sort('a').reset_index()
del df1['index']
#get last rows of groups, sort them and reset index; delete redundant col index
df2 = df.groupby('a').tail(1).sort('a').reset_index()
del df2['index']
print df1
# a b c
#0 1 S Blue
#1 2 M Black
print df2
# a b c
#0 1 L Green
#1 2 L Blue
#if value in col c in df1 is 'Blue' replace this row with row from df2 (indexes are same)
df1.loc[df1['c'].isin(['Blue'])] = df2
print df1
# a b c
#0 1 L Green
#1 2 M Black
关于python - 根据条件删除 pandas DataFrame 中的重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32995577/