我有这种类型的宽数据集,我想将其转换为长格式。 choice
列显示了所选选项的顺序。因此,如果它是 choice2
,则表示选择了第二个选项,并且该列中的第二个选项应为 1,其他选项应为 0。 college, hsg2 coml5
列应重复。
数据:
respondent choice college hsg2 coml5 type1 type2 type3 type4 type5 type6 fuel1 fuel2 fuel3 fuel4 fuel5 fuel6
1 choice1 0 0 0 van regcar van stwagon van truck cng cng electric electric gasoline gasoline
2 choice2 1 1 1 regcar van regcar stwagon regcar truck methanol methanol cng cng gasoline gasoline
这就是我想要实现的目标。我找不到任何方法来正确转换 choice
列。任何帮助表示赞赏。我查看了大部分从宽到长,反之亦然的问题,但无法解决我的问题。
最佳答案
使用wide_to_long
首先,然后从 choice
列中获取数字,并按最后一列的每组计数器进行比较:
df = pd.wide_to_long(df,
stubnames=['type','fuel'],
i=['respondent','choice','college','hsg2','coml5'],
j='tmp').reset_index().drop('tmp', 1)
df['choice'] = df['choice'].str.extract('(\d+)').astype(int)
df['order'] = df.groupby(['respondent','choice','college','hsg2','coml5']).cumcount().add(1)
df['new'] = df['choice'].eq(df['order']).astype(int)
print (df)
respondent choice college hsg2 coml5 type fuel order new
0 1 1 0 0 0 van cng 1 1
1 1 1 0 0 0 regcar cng 2 0
2 1 1 0 0 0 van electric 3 0
3 1 1 0 0 0 stwagon electric 4 0
4 1 1 0 0 0 van gasoline 5 0
5 1 1 0 0 0 truck gasoline 6 0
6 2 2 1 1 1 regcar methanol 1 0
7 2 2 1 1 1 van methanol 2 1
8 2 2 1 1 1 regcar cng 3 0
9 2 2 1 1 1 stwagon cng 4 0
10 2 2 1 1 1 regcar gasoline 5 0
11 2 2 1 1 1 truck gasoline 6 0
如果order
是根据type
和fuel
列名称生成的:
df = pd.wide_to_long(df,
stubnames=['type','fuel'],
i=['respondent','choice','college','hsg2','coml5'],
j='order').reset_index()
df['choice'] = df['choice'].str.extract('(\d+)').astype(int)
df['new'] = df['choice'].eq(df['order']).astype(int)
print (df)
respondent choice college hsg2 coml5 order type fuel new
0 1 1 0 0 0 1 van cng 1
1 1 1 0 0 0 2 regcar cng 0
2 1 1 0 0 0 3 van electric 0
3 1 1 0 0 0 4 stwagon electric 0
4 1 1 0 0 0 5 van gasoline 0
5 1 1 0 0 0 6 truck gasoline 0
6 2 2 1 1 1 1 regcar methanol 0
7 2 2 1 1 1 2 van methanol 1
8 2 2 1 1 1 3 regcar cng 0
9 2 2 1 1 1 4 stwagon cng 0
10 2 2 1 1 1 5 regcar gasoline 0
11 2 2 1 1 1 6 truck gasoline 0
关于python - 定制 Pandas 宽到长,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70969521/