在我的数据框中,我有“away_lineup”列,其中包含 5 个字符串的分组,以及“play_length”列,其中每行都有一个持续时间值。我知道 np.unique 可以检测唯一的字符串值,并且 np.sum 值在列中添加值,但是如何使用像 np.unique 这样的函数来检测每个唯一的字符串并对字符串的“play_length”值求和连续出现?
away_lineup play_length
0 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons 0:00:05
1 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons 0:00:10
2 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons 0:00:20
3 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons 0:00:07
4 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons 0:00:25
5 Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, JJ Redick 0:00:14
我想要的输出是这样的
player play_length
Dario Saric 0:01:21
Robert Covington 0:01:21
Joel Embiid 0:01:21
Markelle Fultz 0:01:21
Ben Simmons 0:01:07
JJ Redick 0:00:14
从“away_lineup”中提取唯一名称,存储在新列“player”中,并且存在player值的行添加了“play_length”值。
最佳答案
你可以像这样使用爆炸和分组
import numpy as np
import pandas as pd
## create dummy data
arr = [("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons", "00:00:05"),
("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons", "00:00:10"),
("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons", "00:00:20"),
("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons", "00:00:07"),
("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, Ben Simmons", "00:00:25"),
("Dario Saric, Robert Covington, Joel Embiid, Markelle Fultz, JJ Redick", "00:00:14"),]
df = pd.DataFrame(arr, columns=["Player", "Play Time"])
df["Play Time"] = pd.to_timedelta(df["Play Time"])
## Solution
df["Player"] = df["Player"].str.split(",")
df.explode("Player").groupby("Player").sum()
输出
Play Time
Player
Ben Simmons 00:01:07
JJ Redick 00:00:14
Joel Embiid 00:01:21
Markelle Fultz 00:01:21
Robert Covington 00:01:21
Dario Saric 00:01:21
关于python - 将唯一字符串分组到列中并对单独的列值执行函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58108633/