python - 如何对重复模式进行分类？

标签 python jupyter-notebook

在我的 Dataframe 中，我有一个字段显示加类订购的产品的状态。这可以是“新建”、“已取消”、“已填充”或“部分”。我总结了记录的每个订单(Orderid)的模式，并对可能出现的不同模式进行了统计。然而，这导致了超过 1385 种不同的模式。我现在想将这些模式压缩到箱子中，例如，如果订单状态是:新的、新的、取消的、新的、填充的将被压缩为:新的、取消的、新的、填充的。

这将被放入与以下模式相同的容器中:新的、新的、新的、已取消的、已取消的、新的、新的、已填充的。

这是原始数据的样子:

按每个 OrderID 分组后:

为了查看数据中存在的 OrderStatus 模式，应用了以下代码:

def status_transition_with_timestamp(each_grouped_df):
    sorted_df = each_grouped_df.sort_values('timestamp', ascending=True)
    concatenated_transition = ','.join(sorted_df['ostatus'])
    return concatenated_transition

result = df_grouped['ostatus'].agg(status_transition_with_timestamp)

result.groupby('ostatus').count()

结果:Output of counts

最佳答案

要删除连续的重复项，请使用 itertools.groupby :

from itertools import groupby
df['ostatus'] = df['ostatus'].apply(lambda x: ','.join([x for x, _ in groupby(x.split(','))]))

然后您将拥有独特的序列，您可以执行聚合。

例子:

df = pd.DataFrame({'Status': ['New,New,Cancelled', 'New,Cancelled', 'Cancelled,New,Cancelled,New']})
df
#                        Status
#0            New,New,Cancelled
#1                New,Cancelled
#2  Cancelled,New,Cancelled,New

df['Status'] = df['Status'].apply(lambda x: ','.join([x for x, _ in groupby(x.split(','))]))
df
#                        Status
#0                New,Cancelled
#1                New,Cancelled
#2  Cancelled,New,Cancelled,New

关于python - 如何对重复模式进行分类？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56147052/

上一篇：python - 如何使用 boto3 更新 API 网关中的 Lambda 函数版本？

下一篇：python - 获取满足条件的所有可能的 3x2 矩阵的数量

python - 在 Jupyter Notebook 中运行时，来自 Python 脚本的 Matplotlib 图未显示在输出中

python - 如何将 iPython HTML 类发送到 .html 文件？

python - 删除模型的 'table' 并在 django 南迁移中重新创建

python - 我如何在 python 中使用 wsdl url

python - 如何在 Jupyter 中很好地展示 Pyspark DataFrame？

python - 为什么在 Jupiter Notebook 上执行单元测试时会出现 AttributeError？

python - 使用 pip 在 Windows 10 上安装 Jupyter 时出错

python - 如何使用图像形式的数据和文本文件形式的标签在 tensorflow 中准备我自己的数据集？

python - 在Python中导入elasticsearch.helpers时出现ImportError