python - Pandas 错误 : Index contains duplicate entries, 无法 reshape

我的问题似乎是重复的，因为我发现不同的问题具有如下相同的错误:

Pandas: grouping a column on a value and creating new column headings

Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape

Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape

我尝试了这些帖子中提供的所有解决方案，但没有一个奏效。我相信这个错误可能是由我的数据集格式引起的，它有字符串而不是数字和可能重复的整体。以下是我的数据集示例:

<表类="s-表"> <头> 协议(protocol)号事件描述 <正文> 1586212 走路一天两次 1586212 开车 5公里 1586212 开车至少30分钟 1586212 sleep NaN 1586212 吃 1500卡路里 2547852 走路 NaN 2547852 开车 NaN 2547852 吃 3200卡路里 2547852 吃避免面食 2547852 sleep 至少10小时

我要实现的输出是:

<表类="s-表"> <头> 协议(protocol)号 <日>走开车 sleep 吃 <正文> 1586212 一天两次 5公里 NaN 1500卡路里 2547852 NaN NaN 3200卡路里至少10小时

我尝试使用 pivot 和 pivot_table 以及如下代码:

df.pivot(index="protocol_no", columns="activity", values="description")

但我仍然收到此错误:

ValueError: Index contains duplicate entries, cannot reshape

不知道出了什么问题，所以任何帮助都会有帮助!

编辑:

我注意到我的数据包含重复的整体，如错误以及@DYZ 和@SeaBean 用户所述。所以我编辑了数据库示例并为我的数据集提供了正确的答案。希望对某人有所帮助。

最佳答案

尝试使用 .piviot_table()如果在使用 .pivot()

时出现重复索引错误，则使用 aggfunc='first'(或类似的东西)

df.pivot_table(index="protocol_no", columns="activity", values="description", aggfunc='first')

当您设置为 index 的列具有重复值时，这是一种常见情况。使用 aggfunc='first'(或有时使用 aggfunc='sum'，具体取决于条件)很可能可以解决问题。

结果:

activity    drive            eat              sleep         walk
protocol_no                                                     
1586212      5 km  1500 calories                NaN  twice a day
2547852       NaN  3200 calories  At least 10 hours          NaN

编辑

根据您最近对重复条目的编辑，您可以通过更改上面的 aggfunc 函数来修改上面的解决方案，如下所示:

df.pivot_table(index="protocol_no", columns="activity", values="description", aggfunc=lambda x: ' '.join(x.dropna()))

在这里，我们将 aggfunc 从 'first' 更改为 lambda x: ' '.join(x.dropna())。它无需添加多行代码即可实现与您期望的输出相同的结果。

结果:

activity                    drive                        eat              sleep         walk
protocol_no                                                                                 
1586212      5 km At least 30 min              1500 calories                     twice a day
2547852                            3200 calories Avoid pasta  At least 10 hours

关于python - Pandas 错误 : Index contains duplicate entries, 无法 reshape ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69484287/

python - Pandas 错误 : Index contains duplicate entries, 无法 reshape

编辑

上一篇：node.js - 带 Node 的 AWS Lambda - 将文件保存到 Lambda 的文件系统中

下一篇：react-native - react-native-reanimated 构建失败