我有一个电影数据框,如下所示:
Index Movie Rotten Tomato Score Director Ranking
1 Batman Rises 87 Christopher Nolan 1
2 Interstellar 73 Christopher Nolan 2
3 Legally Blonde 71 Robert Luketic 1
4 The Ugly Truth 14 Robert Luketic 2
5 (500) Days of Summer 85 Marc Webb 1
6 The Amazing Spider-Man 71 Marc Webb 2
7 Wide Awake 45 M N Shyamalan 1
8 The Last Airbender 5 M N Shyamalan 2
我制作了一个排名栏,根据烂番茄百分比显示排名顺序。这是按电影导演分组的。
我想做的是使用排名列,如果排名靠前的电影得分高于某个点(50%),则删除排名较低的电影。例如,对于 Marc Webb,我唯一想为他放映的电影是《夏日的 500 天》,但是我想为 M N Shyamalan 放映这两部电影。理想的表格如下所示:
Index Movie Rotten Tomato Score Director Ranking
1 Batman Rises 87 Christopher Nolan 1
3 Legally Blonde 71 Robert Luketic 1
5 (500) Days of Summer 85 Marc Webb 1
7 Wide Awake 45 M N Shyamalan 1
8 The Last Airbender 5 M N Shyamalan 2
我尝试过:
movie_names = movie_names.groupby('Movie').filter(lambda g: (g.score <= 0.5).any())
然而,这删除了 M N Shyamalan 的两部电影。
有人知道该怎么做吗?任何帮助将不胜感激!
最佳答案
代码
# is movie score > 50?
m = df['Rotten Tomato Score'] > 50
# Does director has at least one movie with score > 50?
cond1 = m.groupby(df['Director']).transform('any')
# flag the duplicate rows and keep the movie with highest score
cond2 = df.sort_values('Rotten Tomato Score').duplicated('Director', keep='last')
# Drop rows when cond1 and cond2 is met
df[~(cond1 & cond2)]
结果
Index Movie Rotten Tomato Score Director Ranking
0 1 Batman Rises 87 Christopher Nolan 1
2 3 Legally Blonde 71 Robert Luketic 1
4 5 (500) Days of Summer 85 Marc Webb 1
6 7 Wide Awake 45 M N Shyamalan 1
7 8 The Last Airbender 5 M N Shyamalan 2
关于python - 使用基于另一行的条件删除组行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75952370/