python-3.x - Pandas 的迭代速度非常慢

我需要一些帮助来替换 Pandas 数据帧迭代中的 iterrows。我有一个像这样的 Pandas 数据框:

| cust_no | channel  | month1 | month2 |
|   1     | radio    | 0.7    | 0.4    |
|   1     | fb       | 0.1    | 0.5    |
|   1     | tv       | 0.2    | 0.1    |
|   2     | fb       | 0.5    | 0.25   |
|   2     | radio    | 0.4    | 0.25   |
|   2     | tv       | 0.1    | 0.5    |

我需要按 cust_no 分组时每月具有最大值的 channel ，并将它们作为字符串连接到同一数据帧的新列中。因此，例如，从上面的数据帧:

在客户 1 的情况下，radio 在第 1 个月具有最大值，但 fb 在第 2 个月具有最大值，因此我需要以下字符串:radio>fb

在客户 2 的情况下，fb 在 1 月有最大值，但 tv 在 2 月有最大值，所以我需要这个强:fb>tv

感谢任何帮助。谢谢。 性能确实很重要

最佳答案

按DataFrame.set_index的 channel 创建索引，然后使用 DataFrameGroupBy.idxmax最后使用apply+join:

df1 = (df.set_index('channel')
         .groupby('cust_no')['month1','month2']
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

如果没有其他列可用，请删除过滤列month1和month2:

df1 = (df.set_index('channel')
         .groupby('cust_no')
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

关于python-3.x - Pandas 的迭代速度非常慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55646795/

上一篇：c - 在 raspberrypi 中制作给出 : missing separator error

下一篇：rest - 在 Web 应用程序中存储用户名和密码

python - 从数据框中的值计算增量

python - 将 pandas DataFrame 旋转 90 度

python - 将 transform 与 nth 一起使用

python - 在 CSV 中保存数组 3D

python - 为什么我的 Python 代码在 Visual Studio Code 中无法正确运行，但在 IDLE 中运行良好？

python-3.x - 使用 selenium 时禁用 Safari 中的自动化警告

python-3.x - Python 如何发送多个文件

python - 仅拆分列表中的表情符号，但保留文本

python - 要 float 的 timedelta 对象