python-3.x - 使用 matplotlib 堆积条形图

我有一个从 WhatsApp 中提取的数据框，其中包含以下列:日期和时间、消息、名称、msg_len。 Date&Time 是一个 DateTime 对象，表示消息何时发送，msg 是实际消息，name 是发送消息的人， >msg_len 是消息的实际长度。我正在尝试使用此数据框构建堆积条形图:X 轴上是日期(例如 2019-02)，y 轴上是该月发送的消息的平均长度或数量，每个条形是按每个人划分。到目前为止我的函数如下所示:

def BarPlotMonth(Data):
    """
    This function plots a barplot for the number of messages sent for each month and the mean length of the messages for each month
    """

    fig,axes = plt.subplots(2,1,
            figsize=(18,10),
            sharex = True)


    GroupedByMonth = Data.groupby(Data['Date&Time'].dt.strftime('%Y-%m'))['msg_len']

    Mean = GroupedByMonth.mean()
    Count = GroupedByMonth.count()
    Std = GroupedByMonth.std()

    axes[0].bar(Count.index, Count, color = 'lightblue')
    axes[0].set_title('Number of text per month')
    axes[0].set_ylabel('Count')

    axes[1].bar(Mean.index, Mean, color = 'lightblue', yerr = Std)
    axes[1].set_title('Mean lenght of a message per month')
    axes[1].set_ylabel('Mean lenght')
    axes[1].set_xlabel('Year-Month')

    plt.xticks(rotation=45)
    axes[1].legend()

    plt.savefig('WhatsApp_conversations.png')
    plt.show()

但我无法分割每个栏。我该如何解决这个问题？

最佳答案

您需要稍微重构一下 DataFrame 才能使用 df.plot(kind='bar', stacked=True)。

group_by_month_per_user = df.groupby(
    [
        df['Date&Time'].dt.strftime('%Y-%m'),
        'name'
    ]   
).mean().unstack()

group_by_month_per_user

这会生成一个具有以下结构的表。

             msg_len                                 
name           alice        bob   giuseppe     martin
Date&Time                                            
2019-01    48.870968  42.315789  56.391304  49.586207
2019-02    51.099174  48.777778  56.173913  51.895652
2019-03    52.336364  49.626168  47.021898  46.626263

请注意，这些列是一个多重索引，所有列上都有 msg_len，我们需要删除它以保持图例整洁(可以简单地选择整个列)。然后可以将生成的 DataFrame 传递给 .plot。

group_by_month_per_user['msg_len'].plot(kind='bar', stacked=True, legend=['name'])

这会产生以下图。

以下代码用于生成随机数据集。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from random import randint, choice
import string

ts = datetime.now()
data = []
names = ['bob', 'alice', 'martin', 'giuseppe']

for n in range(1000):
    msg_len = randint(0, 100)
    row = [
        ts - timedelta(days=randint(-30,30)),
        ''.join(random.choice(string.ascii_lowercase) for _ in range(msg_len)),
        choice(names),
        msg_len
    ]

    data.append(row)

df = pd.DataFrame(data, columns = ['Date&Time', 'msg', 'name', 'msg_len'])

关于python-3.x - 使用 matplotlib 堆积条形图，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54847424/

python-3.x - 使用 matplotlib 堆积条形图

上一篇：angular - 在 Angular 7 中刷新 datatables.net 表数据会保留表首次加载时旧数据的副本

下一篇：kdb - 如何在 kdb+ 中设置某个范围内的随机数？