Python 根据字符串条件生成新列

我有以下 DF:

| Fecha      | Partido                 | Equipo  |  xG  |  xGA |
|------------|-------------------------|---------|------|------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 |

我想基于 Partido 列创建三个新列，其中第一个团队转到名为 Home 的新列，第二个团队转到名为 Visitor 的列，分数转到名为 Score 的列。

期望的输出:

| Fecha      | Partido                 | Equipo  |  xG  |  xGA | Home    | Away       | Score |
|------------|-------------------------|---------|------|------|-------- |------------|-------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 | América | Cruz Azul  | 0:0   |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 | Leon    | América    | 2:0   |

我曾尝试使用分隔符进行拆分，但由于某些团队的名称中有两个词，因此它不起作用。

最佳答案

使用str.extract 非常简单和一个正则表达式:

regex = r'([^-]+)\s*-\s*([^-]+) (\d+:\d+)'
df[['Home', 'Away', 'Score']] = df['Partido'].str.extract(regex)

输出:

        Fecha                  Partido   Equipo    xG   xGA      Home       Away Score
0  2022-05-01  América - Cruz Azul 0:0  América  1.53  0.45  América   Cruz Azul   0:0
1  2022-05-01       Leon - América 2:0  América  1.70  0.35     Leon     América   2:0

regex demo

如果不想修改原来的DataFrame，也可以使用named capturing groups直接设置列名:

regex = r'(?P<Home>[^-]+)\s*-\s*(?P<Away>[^-]+) (?P<Score>\d+:\d+)'
df2 = df['Partido'].str.extract(regex)

#        Home       Away Score
# 0  América   Cruz Azul   0:0
# 1     Leon     América   2:0

# OR
df2 = df.join(df['Partido'].str.extract(regex))

# same a first output

关于Python 根据字符串条件生成新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72169421/

上一篇：r - 将行拆分为两行

下一篇：lua - 如何从同一个 math.random() 函数中给 X、Y、Z 不同的值？

python - 寻找整个 Pandas 数据框的中位数

python - 无法按分位数选择 Pandas DataFrame

python - 组合 Python Pandas Dataframe 比 List Append 方法更有效

python - 在不改变分组列位置的情况下按另一组中的一列排序

python - 返回最小数字的索引而不是最小数字

python - Wagtail 'View live' 按钮在使用 id 作为 slug 时创建页面后提供错误的 url

Python 脚本查找文件中的文本并对其进行检查

python - 将函数应用于 pandas dataframe 并获得不同大小的 ndarray 输出

python - 如何删除包含相同值的 Pandas DataFrame 中的列