python - 将字符串拆分为数据框 pandas

标签 python pandas jupyter-notebook

我有一个与下面完全相同的字符串,我的目标是将其分割成一个数据帧,但我发现它无法正常工作。我尝试过在堆栈上搜索,但一无所获。

'Position             Players   Average Form\nGoalkeeper        Manuel Neuer  4.17017132535\n  Defender         Diego Godin  4.14973163459\n  Defender   Giorgio Chiellini  4.10115207373\n  Defender        Thiago Silva  3.93318274318\n  Defender     Andrea Barzagli  3.85132973289\nMidfielder        Arjen Robben  4.80556193806\nMidfielder     Alexander Meier  4.51037598508\nMidfielder       Franck Ribery  4.48063714064\nMidfielder         David Silva  3.76028050109\n   Forward   Cristiano Ronaldo  7.87909462636\n   Forward  Zlatan Ibrahimovic  6.85401665065'

有没有办法以可重现的方式将其转换为数据帧,以便我可以使用其他字符串来实现?

我的目标数据框如下所示:

Position    name                Average
Goalkeeper  Manuel              4.17017132535
Defender    Diego               4.14973163459
Defender    Giorgio             4.10115207373
Defender    Thiago              3.93318274318
Defender    Andrea              3.85132973289
Midfielder  Arjen               4.80556193806
Midfielder  Alexander           4.51037598508
Midfielder  Franck              4.48063714064
Midfielder  David               3.76028050109
Forward     Cristiano           7.87909462636
Forward     Hnery               6.85401665065

我是 Pandas 新手,因此我们将不胜感激

最佳答案

这是一种方法。

import pandas as pd

mystr = 'Position             Players   Average Form\nGoalkeeper        Manuel Neuer  4.17017132535\n  Defender         Diego Godin  4.14973163459\n  Defender   Giorgio Chiellini  4.10115207373\n  Defender        Thiago Silva  3.93318274318\n  Defender     Andrea Barzagli  3.85132973289\nMidfielder        Arjen Robben  4.80556193806\nMidfielder     Alexander Meier  4.51037598508\nMidfielder       Franck Ribery  4.48063714064\nMidfielder         David Silva  3.76028050109\n   Forward   Cristiano Ronaldo  7.87909462636\n   Forward  Zlatan Ibrahimovic  6.85401665065'

lst = mystr.split()
data = [lst[pos:pos+4] for pos in range(0, len(lst), 4)]

df = pd.DataFrame(data[1:], columns=data[0])

print(df)

#       Position    Players      Average           Form
# 0   Goalkeeper     Manuel        Neuer  4.17017132535
# 1     Defender      Diego        Godin  4.14973163459
# 2     Defender    Giorgio    Chiellini  4.10115207373
# 3     Defender     Thiago        Silva  3.93318274318
# 4     Defender     Andrea     Barzagli  3.85132973289
# 5   Midfielder      Arjen       Robben  4.80556193806
# 6   Midfielder  Alexander        Meier  4.51037598508
# 7   Midfielder     Franck       Ribery  4.48063714064
# 8   Midfielder      David        Silva  3.76028050109
# 9      Forward  Cristiano      Ronaldo  7.87909462636
# 10     Forward     Zlatan  Ibrahimovic  6.85401665065

在这些情况下,此方法并不完美:

  1. 列名称中的空格,如上所述。在这种情况下,您将需要重新定义列名称。
  2. 玩家姓名中的空格。从所提供的数据来看,这似乎不是问题。

关于python - 将字符串拆分为数据框 pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49895238/

相关文章:

ipython - 有什么方法可以在编辑模式、Jupyter Notebook 中交换 Enter 和 Shift-Enter 输入命令?

python - Pandas:在循环中构造新数据帧时出现 "Returning a view versus a copy"警告

python - 当你比较 2 个 Pandas 系列时会发生什么

python - 类型错误 : expected string or buffer | Python

python - 使用 Plotly Python 绘制树状图

python - 在 Pandas 中高效地进行行间计算

python - 无法将 Jupyter 笔记本导出到 Azure ML Studio 中的 Python 脚本

python - 如何在google-colaboratory上安装需要编译的库

python - 修改 numpy 数组的特定单元格列表

python - Cupy 无法找到 CUDA 存储库