python - 如何在 pandas + matplotlib 中的多个日期的一列中绘制多个因素的值?

标签 python pandas matplotlib

我有一个如下所示的数据框:

df=pd.Dataframe({'animal': {Timestamp('2014-11-12 00:00:00'): 'dog',
  Timestamp('2014-11-13 00:00:00'): 'rabbit',
  Timestamp('2014-11-14 00:00:00'): 'rabbit',
  Timestamp('2014-11-15 00:00:00'): 'rabbit',
  Timestamp('2014-11-16 00:00:00'): 'rabbit',
  Timestamp('2014-11-17 00:00:00'): 'rabbit',
  Timestamp('2014-11-18 00:00:00'): 'dog',
  Timestamp('2014-11-19 00:00:00'): 'rabbit',
  Timestamp('2014-11-20 00:00:00'): 'dog',
  Timestamp('2014-11-21 00:00:00'): 'dog',
  Timestamp('2014-12-01 00:00:00'): 'rabbit',
  Timestamp('2014-12-02 00:00:00'): 'dog',
  Timestamp('2014-12-03 00:00:00'): 'dog',
  Timestamp('2014-12-04 00:00:00'): 'rabbit',
  Timestamp('2014-12-05 00:00:00'): 'rabbit',
  Timestamp('2014-12-06 00:00:00'): 'dog',
  Timestamp('2014-12-07 00:00:00'): 'dog',
  Timestamp('2014-12-08 00:00:00'): 'rabbit',
  Timestamp('2014-12-09 00:00:00'): 'rabbit',
  Timestamp('2014-12-10 00:00:00'): 'rabbit',
  Timestamp('2014-12-11 00:00:00'): 'rabbit',
  Timestamp('2014-12-12 00:00:00'): 'rabbit',
  Timestamp('2014-12-13 00:00:00'): 'rabbit',
  Timestamp('2014-12-14 00:00:00'): 'rabbit',
  Timestamp('2014-12-15 00:00:00'): 'dog',
  Timestamp('2014-12-16 00:00:00'): 'dog',
  Timestamp('2014-12-17 00:00:00'): 'dog',
  Timestamp('2014-12-18 00:00:00'): 'rabbit',
  Timestamp('2014-12-19 00:00:00'): 'rabbit',
  Timestamp('2014-12-20 00:00:00'): 'dog'},
 'count': {Timestamp('2014-11-12 00:00:00'): 6136,
  Timestamp('2014-11-13 00:00:00'): 14620,
  Timestamp('2014-11-14 00:00:00'): 16437,
  Timestamp('2014-11-15 00:00:00'): 17273,
  Timestamp('2014-11-16 00:00:00'): 15302,
  Timestamp('2014-11-17 00:00:00'): 15180,
  Timestamp('2014-11-18 00:00:00'): 7177,
  Timestamp('2014-11-19 00:00:00'): 16193,
  Timestamp('2014-11-20 00:00:00'): 8226,
  Timestamp('2014-11-21 00:00:00'): 9741,
  Timestamp('2014-12-01 00:00:00'): 26237,
  Timestamp('2014-12-02 00:00:00'): 12146,
  Timestamp('2014-12-03 00:00:00'): 12910,
  Timestamp('2014-12-04 00:00:00'): 25820,
  Timestamp('2014-12-05 00:00:00'): 29323,
  Timestamp('2014-12-06 00:00:00'): 17294,
  Timestamp('2014-12-07 00:00:00'): 15219,
  Timestamp('2014-12-08 00:00:00'): 26174,
  Timestamp('2014-12-09 00:00:00'): 27112,
  Timestamp('2014-12-10 00:00:00'): 27131,
  Timestamp('2014-12-11 00:00:00'): 28268,
  Timestamp('2014-12-12 00:00:00'): 34059,
  Timestamp('2014-12-13 00:00:00'): 39162,
  Timestamp('2014-12-14 00:00:00'): 38314,
  Timestamp('2014-12-15 00:00:00'): 19807,
  Timestamp('2014-12-16 00:00:00'): 20606,
  Timestamp('2014-12-17 00:00:00'): 21552,
  Timestamp('2014-12-18 00:00:00'): 36499,
  Timestamp('2014-12-19 00:00:00'): 42163,
  Timestamp('2014-12-20 00:00:00'): 30301},
 'day': {Timestamp('2014-11-12 00:00:00'): 12,
  Timestamp('2014-11-13 00:00:00'): 13,
  Timestamp('2014-11-14 00:00:00'): 14,
  Timestamp('2014-11-15 00:00:00'): 15,
  Timestamp('2014-11-16 00:00:00'): 16,
  Timestamp('2014-11-17 00:00:00'): 17,
  Timestamp('2014-11-18 00:00:00'): 18,
  Timestamp('2014-11-19 00:00:00'): 19,
  Timestamp('2014-11-20 00:00:00'): 20,
  Timestamp('2014-11-21 00:00:00'): 21,
  Timestamp('2014-12-01 00:00:00'): 1,
  Timestamp('2014-12-02 00:00:00'): 2,
  Timestamp('2014-12-03 00:00:00'): 3,
  Timestamp('2014-12-04 00:00:00'): 4,
  Timestamp('2014-12-05 00:00:00'): 5,
  Timestamp('2014-12-06 00:00:00'): 6,
  Timestamp('2014-12-07 00:00:00'): 7,
  Timestamp('2014-12-08 00:00:00'): 8,
  Timestamp('2014-12-09 00:00:00'): 9,
  Timestamp('2014-12-10 00:00:00'): 10,
  Timestamp('2014-12-11 00:00:00'): 11,
  Timestamp('2014-12-12 00:00:00'): 12,
  Timestamp('2014-12-13 00:00:00'): 13,
  Timestamp('2014-12-14 00:00:00'): 14,
  Timestamp('2014-12-15 00:00:00'): 15,
  Timestamp('2014-12-16 00:00:00'): 16,
  Timestamp('2014-12-17 00:00:00'): 17,
  Timestamp('2014-12-18 00:00:00'): 18,
  Timestamp('2014-12-19 00:00:00'): 19,
  Timestamp('2014-12-20 00:00:00'): 20},
 'month': {Timestamp('2014-11-12 00:00:00'): 11,
  Timestamp('2014-11-13 00:00:00'): 11,
  Timestamp('2014-11-14 00:00:00'): 11,
  Timestamp('2014-11-15 00:00:00'): 11,
  Timestamp('2014-11-16 00:00:00'): 11,
  Timestamp('2014-11-17 00:00:00'): 11,
  Timestamp('2014-11-18 00:00:00'): 11,
  Timestamp('2014-11-19 00:00:00'): 11,
  Timestamp('2014-11-20 00:00:00'): 11,
  Timestamp('2014-11-21 00:00:00'): 11,
  Timestamp('2014-12-01 00:00:00'): 12,
  Timestamp('2014-12-02 00:00:00'): 12,
  Timestamp('2014-12-03 00:00:00'): 12,
  Timestamp('2014-12-04 00:00:00'): 12,
  Timestamp('2014-12-05 00:00:00'): 12,
  Timestamp('2014-12-06 00:00:00'): 12,
  Timestamp('2014-12-07 00:00:00'): 12,
  Timestamp('2014-12-08 00:00:00'): 12,
  Timestamp('2014-12-09 00:00:00'): 12,
  Timestamp('2014-12-10 00:00:00'): 12,
  Timestamp('2014-12-11 00:00:00'): 12,
  Timestamp('2014-12-12 00:00:00'): 12,
  Timestamp('2014-12-13 00:00:00'): 12,
  Timestamp('2014-12-14 00:00:00'): 12,
  Timestamp('2014-12-15 00:00:00'): 12,
  Timestamp('2014-12-16 00:00:00'): 12,
  Timestamp('2014-12-17 00:00:00'): 12,
  Timestamp('2014-12-18 00:00:00'): 12,
  Timestamp('2014-12-19 00:00:00'): 12,
  Timestamp('2014-12-20 00:00:00'): 12}}

我正在尝试为两只动物在 7 天内的数量绘制折线图;本质上,我的目标是将每只动物的时间序列显示在同一张图表上。

这是我的代码:

df['date'] = pd.to_datetime(df['date'], dayfirst=True, infer_datetime_format = True)
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')

grouped = df.groupby('animal')
for key, group in grouped:
    data = group.groupby(lambda x: x.day)
    data['count'].plot(label=key)


plt.legend()

plt.show()

而不是像这样显示两种动物的计数的东西: enter image description here

我最接近的是: enter image description here

我觉得我在这里遗漏了一个明显的 block ,但我不太明白。

编辑:我不太明白如何同时按月和日排序,所以在数据框中附加了一些数据。

最佳答案

创建一个day 列来存储所有天数:

df['day'] = df.index.day

因为我们希望日期沿 x 轴排序,所以也对列进行排序:

df = df.sort_values(by='day')

然后您可以按动物分组并绘制每个子组:

grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group.plot('day', 'count', label=key, ax=ax)

请注意,group.plot 调用 DataFrame.plot,它允许您指定要用于 x-y-轴。相反,group['count'].plot 调用 Series.plot,它假定 x 轴 是索引,y 轴 是系列的值。


import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp


df = pd.DataFrame({'animal': {12: 'dog', 44: 'dog', 47: 'dog', 69: 'rabbit', 76: 'rabbit', 84: 'dog', 122: 'rabbit', 162: 'rabbit', 177: 'rabbit', 190: 'rabbit', 217: 'dog', 219: 'dog', 220: 'dog', 226: 'rabbit'},
 'count': {12: 34573, 44: 30676, 47: 41821, 69: 56880, 76: 73172, 84: 30581, 122: 52895, 162: 58430, 177: 57132, 190: 53903, 217: 32001, 219: 35776, 220: 31095, 226: 53809},
 'date': {12: Timestamp('2014-12-29 00:00:00'), 44: Timestamp('2014-12-28 00:00:00'), 47: Timestamp('2014-12-31 00:00:00'), 69: Timestamp('2014-12-29 00:00:00'), 76: Timestamp('2014-12-31 00:00:00'), 84: Timestamp('2014-12-26 00:00:00'), 122: Timestamp('2014-12-25 00:00:00'), 162: Timestamp('2014-12-30 00:00:00'), 177: Timestamp('2014-12-27 00:00:00'), 190: Timestamp('2014-12-28 00:00:00'), 217: Timestamp('2014-12-27 00:00:00'), 219: Timestamp('2014-12-30 00:00:00'), 220: Timestamp('2014-12-25 00:00:00'), 226: Timestamp('2014-12-26 00:00:00')}})

df['animal'] = df['animal'].astype('category')
df = df.set_index('date')


df['day'] = df.index.day
df = df.sort_values(by='day')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group.plot('day', 'count', label=key, ax=ax)

plt.legend(loc='best')

plt.show()

enter image description here


对于修订后的问题,如果您想要沿着 x 轴 的整个日期,那么使用 Series.plot 可能是最简单的方法(就像您在你的原始代码中做):

import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp

df = pd.DataFrame({'animal': ['dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'dog', 'dog', 'rabbit', 'rabbit', 'dog'], 'count': [6136, 14620, 16437, 17273, 15302, 15180, 7177, 16193, 8226, 9741, 26237, 12146, 12910, 25820, 29323, 17294, 15219, 26174, 27112, 27131, 28268, 34059, 39162, 38314, 19807, 20606, 21552, 36499, 42163, 30301], 'date': [Timestamp('2014-11-12 00:00:00'), Timestamp('2014-11-13 00:00:00'), Timestamp('2014-11-14 00:00:00'), Timestamp('2014-11-15 00:00:00'), Timestamp('2014-11-16 00:00:00'), Timestamp('2014-11-17 00:00:00'), Timestamp('2014-11-18 00:00:00'), Timestamp('2014-11-19 00:00:00'), Timestamp('2014-11-20 00:00:00'), Timestamp('2014-11-21 00:00:00'), Timestamp('2014-12-01 00:00:00'), Timestamp('2014-12-02 00:00:00'), Timestamp('2014-12-03 00:00:00'), Timestamp('2014-12-04 00:00:00'), Timestamp('2014-12-05 00:00:00'), Timestamp('2014-12-06 00:00:00'), Timestamp('2014-12-07 00:00:00'), Timestamp('2014-12-08 00:00:00'), Timestamp('2014-12-09 00:00:00'), Timestamp('2014-12-10 00:00:00'), Timestamp('2014-12-11 00:00:00'), Timestamp('2014-12-12 00:00:00'), Timestamp('2014-12-13 00:00:00'), Timestamp('2014-12-14 00:00:00'), Timestamp('2014-12-15 00:00:00'), Timestamp('2014-12-16 00:00:00'), Timestamp('2014-12-17 00:00:00'), Timestamp('2014-12-18 00:00:00'), Timestamp('2014-12-19 00:00:00'), Timestamp('2014-12-20 00:00:00')]})
df = df.set_index('date')

grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group['count'].plot(label=key, ax=ax)

plt.legend(loc='best')

plt.show()

enter image description here

关于python - 如何在 pandas + matplotlib 中的多个日期的一列中绘制多个因素的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34830921/

相关文章:

python - 正则表达式可在更多条件下匹配版权声明中的公司名称

Python正则表达式获取组位置

python - 无法安装ffmpeg

python - 如何找到二进制数中连续1的组数

窗口顶部的Python Pandas滚动总和位值

python - 将 numpy 数组存储在 pandas 数据框的多个单元格中(Python)

python - 如何在matplotlib中填充分散区域?

python - 从 discord 使用 FFmpegPCMAudio 的问题

pandas - 使用日期时间列类型设置主要 Xtick

python - 如何创建多个图形而不显示它们,直到调用 plt.show ?