python - 更有效的 pandas 数据帧操作方式 : filtering and melting

我有一个数据集，需要从长到宽进行解析和操作。每行代表一个人，有多个列代表测量的实例(英国生物银行格式):

import pandas as pd
# initialize data of lists.
data = {'id': ['1', '2', '3', '4'],
        '3-0.0': [20, 21, 19, 18],
        '3-1.0': [10, 11, 29, 12],
        '3-2.0': [5, 6, 7, 8]}
# Create DataFrame
df = pd.DataFrame(data)
df.set_index('id')

3-0.0、3-1.0 和 3-2.0 是同一事件对给定人员的三种不同度量。我想要的是给定人员的多行和指示事件实例(0,1 或 2)的列，然后是关联值的列。

我的低效方法如下，我知道它可以做得更好。我是 python 的新手，所以正在寻找更有效的编码方式:

# parsing out each instance 
i0 = df.filter(regex="\-0\.")
i1 = df.filter(regex="\-1\.")
i2 = df.filter(regex="\-2\.")

# set index as column and melt each df
i0.reset_index(inplace=True)
i0 = pd.melt(i0, id_vars = "index",  ignore_index = True).dropna().drop(columns=['variable']).assign(instance = '0')

i1.reset_index(inplace=True)
i1 = pd.melt(i1, id_vars = "index",  ignore_index = True).dropna().drop(columns=['variable']).assign(instance = '1')

i2.reset_index(inplace=True)
i2 = pd.melt(i2, id_vars = "index",  ignore_index = True).dropna().drop(columns=['variable']).assign(instance = '2')

# concatenate back together
fin = pd.concat([i0,i1,i2])

data = {'id': ['1', '2', '3', '4'],
        '3-0.0': [20, 21, 19, 18],
        '3-1.0': [10, 11, 29, 12],
        '3-2.0': [5, 6, 7, 8]}

# final dataset looks like this
id, measure, instance
1   20   0
1   10   1
1   5    2
2   21   0 
2   11   1
2   6   2
3   19   0
3   29   1
3   7   2
4   18   0 
4   12   1 
4   8   2

如果您能将多个测量列的格式合并为这样的事实，将获得奖励 3-0.0','3-1.0', '3-2.0','4-0.0','4-1.0', '4-2.0',...

最佳答案

给定:

  id  3-0.0  3-1.0  3-2.0  4-0.0  4-1.0  4-2.0
0  1     20     10      5     10      5     20
1  2     21     11      6     11      6     21
2  3     19     29      7     29      7     19
3  4     18     12      8     12      8     18

正在做:

unique_people = df.filter(regex='\d-').columns.str.split('-').str[0].unique()
out = pd.wide_to_long(df, stubnames=unique_people, i='id', j='instance', sep='-', suffix='.*')
out = out.rename(int, level=1)
print(out.sort_index())

输出:

              3   4
id instance
1  0         20  10
   1         10   5
   2          5  20
2  0         21  11
   1         11   6
   2          6  21
3  0         19  29
   1         29   7
   2          7  19
4  0         18  12
   1         12   8
   2          8  18

关于python - 更有效的 pandas 数据帧操作方式 : filtering and melting，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72878029/

python - 更有效的 pandas 数据帧操作方式 : filtering and melting

上一篇：r - 当一个值也可以为 NULL 时，测试它是否等于一个字符串值

下一篇：javascript - npx react-native 链接命令在最新版本的 react native 中不起作用。我们如何在我的项目中链接自定义字体系列