我有以下数据框:
import pandas as pd
df = pd.DataFrame.from_dict({'data_point_key': {28: 'a',
30: 'b',
31: 'c',
32: 'd',
33: 'e',
34: 'f',
55: 'a',
56: 'b',
57: 'c',
58: 'd',
59: 'e',
60: 'f',
61: 'a',
63: 'b',
64: 'c',
65: 'd',
66: 'e',
67: 'f',
88: 'a',
89: 'b',
90: 'c',
91: 'd',
92: 'e',
93: 'f'},
'val': {28: '8',
30: 'A',
31: '8',
32: 'kuku',
33: 'lulu',
34: 'DE',
55: '8',
56: 'B',
57: '8',
58: 'kuku',
59: 'lulu',
60: 'DE',
61: '8',
63: 'A',
64: '8',
65: 'kuku',
66: 'lulu',
67: 'DE',
88: '8',
89: 'B',
90: '8',
91: 'kuku',
92: 'lulu',
93: 'DE'},
'request_id': {28: 4,
30: 4,
31: 4,
32: 4,
33: 4,
34: 4,
55: 4,
56: 4,
57: 4,
58: 4,
59: 4,
60: 4,
61: 7,
63: 7,
64: 7,
65: 7,
66: 7,
67: 7,
88: 7,
89: 7,
90: 7,
91: 7,
92: 7,
93: 7}})
我正在尝试执行以下操作:
pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=','.join)
并且由于 aggfunc 而得到以下重复项:
如何用逗号避免这些重复项?
最佳答案
在我看来,你可以在旋转之前删除重复项似乎是最干净的方法:
df = df.drop_duplicates(['request_id','data_point_key','val'])
df1 = pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=','.join)
print (df1)
data_point_key a b c d e f
request_id
4 8 A,B 8 kuku lulu DE
7 8 A,B 8 kuku lulu DE
替代方法是更改 aggfunc
- 如果顺序很重要:
df1 = (df.pivot_table(index='request_id',
columns='data_point_key',
values='val',
aggfunc=lambda x: ','.join(dict.fromkeys(x).keys())))
print (df1)
data_point_key a b c d e f
request_id
4 8 A,B 8 kuku lulu DE
7 8 A,B 8 kuku lulu DE
或者如果顺序不重要:
df1 = pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=lambda x: ','.join(set(x)))
print (df1)
data_point_key a b c d e f
request_id
4 8 B,A 8 kuku lulu DE
7 8 B,A 8 kuku lulu DE
关于python-3.x - 没有聚合函数的数据透视/交叉表重复问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69481508/