我是 Pandas 的新手,现在我有一个问题。
我从一个 html 站点读取一个表格,并根据网站上的表格设置我的标题。
df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)
现在我的数据框带有匹配的标题,但我有一些行与标题相同,如下例所示。
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 253
2 John Tavares, C NYI 82 38 48 86 5 46 1.05 278
...
10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95 264
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88 268
我知道可以用 panda 删除重复的行,但是是否可以删除与标题或特定行重复的行?
希望你能帮帮我!
最佳答案
您可以使用 boolean indexing
:
df = df[df.PLAYER != 'PLAYER']
如果还需要删除 PLAYER
列中带有 PP
的行,请使用 isin
:
注意:我将 [0]
添加到 read_html
的末尾,因为它返回数据帧列表,您需要选择第一项列表:
df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)[0]
print (df)
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G \
0 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06
1 2 John Tavares, C NYI 82 38 48 86 5 46 1.05
2 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09
3 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00
4 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99
5 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95
6 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08
7 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97
8 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93
9 10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95
10 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
11 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
12 NaN Nick Foligno, LW CBJ 79 31 42 73 16 50 0.92
13 NaN Claude Giroux, C PHI 81 25 48 73 -3 36 0.90
14 NaN Henrik Sedin, C VAN 82 18 55 73 11 22 0.89
15 14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88
...
...
mask = df['PLAYER'].isin(['PLAYER', 'PP'])
print (df[~mask])
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG \
0 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 253
1 2 John Tavares, C NYI 82 38 48 86 5 46 1.05 278
2 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09 237
3 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00 395
4 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99 221
5 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95 153
6 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08 280
7 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97 158
8 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93 226
9 10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95 264
12 NaN Nick Foligno, LW CBJ 79 31 42 73 16 50 0.92 182
13 NaN Claude Giroux, C PHI 81 25 48 73 -3 36 0.90 279
14 NaN Henrik Sedin, C VAN 82 18 55 73 11 22 0.89 101
15 14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88 268
16 NaN Tyler Johnson, C TB 77 29 43 72 33 24 0.94 203
17 16 Ryan Johansen, C CBJ 82 26 45 71 -6 40 0.87 202
18 17 Joe Pavelski, C SJ 82 37 33 70 12 29 0.85 261
19 NaN Evgeni Malkin, C PIT 69 28 42 70 -2 60 1.01 212
20 NaN Ryan Getzlaf, C ANA 77 25 45 70 15 62 0.91 191
21 20 Rick Nash, LW NYR 79 42 27 69 29 36 0.87 304
...
...
关于python - 删除 Pandas 中与您的标题匹配的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40238962/