我有一个如下所示的数据框:
df=pd.Dataframe({'animal': {Timestamp('2014-11-12 00:00:00'): 'dog',
Timestamp('2014-11-13 00:00:00'): 'rabbit',
Timestamp('2014-11-14 00:00:00'): 'rabbit',
Timestamp('2014-11-15 00:00:00'): 'rabbit',
Timestamp('2014-11-16 00:00:00'): 'rabbit',
Timestamp('2014-11-17 00:00:00'): 'rabbit',
Timestamp('2014-11-18 00:00:00'): 'dog',
Timestamp('2014-11-19 00:00:00'): 'rabbit',
Timestamp('2014-11-20 00:00:00'): 'dog',
Timestamp('2014-11-21 00:00:00'): 'dog',
Timestamp('2014-12-01 00:00:00'): 'rabbit',
Timestamp('2014-12-02 00:00:00'): 'dog',
Timestamp('2014-12-03 00:00:00'): 'dog',
Timestamp('2014-12-04 00:00:00'): 'rabbit',
Timestamp('2014-12-05 00:00:00'): 'rabbit',
Timestamp('2014-12-06 00:00:00'): 'dog',
Timestamp('2014-12-07 00:00:00'): 'dog',
Timestamp('2014-12-08 00:00:00'): 'rabbit',
Timestamp('2014-12-09 00:00:00'): 'rabbit',
Timestamp('2014-12-10 00:00:00'): 'rabbit',
Timestamp('2014-12-11 00:00:00'): 'rabbit',
Timestamp('2014-12-12 00:00:00'): 'rabbit',
Timestamp('2014-12-13 00:00:00'): 'rabbit',
Timestamp('2014-12-14 00:00:00'): 'rabbit',
Timestamp('2014-12-15 00:00:00'): 'dog',
Timestamp('2014-12-16 00:00:00'): 'dog',
Timestamp('2014-12-17 00:00:00'): 'dog',
Timestamp('2014-12-18 00:00:00'): 'rabbit',
Timestamp('2014-12-19 00:00:00'): 'rabbit',
Timestamp('2014-12-20 00:00:00'): 'dog'},
'count': {Timestamp('2014-11-12 00:00:00'): 6136,
Timestamp('2014-11-13 00:00:00'): 14620,
Timestamp('2014-11-14 00:00:00'): 16437,
Timestamp('2014-11-15 00:00:00'): 17273,
Timestamp('2014-11-16 00:00:00'): 15302,
Timestamp('2014-11-17 00:00:00'): 15180,
Timestamp('2014-11-18 00:00:00'): 7177,
Timestamp('2014-11-19 00:00:00'): 16193,
Timestamp('2014-11-20 00:00:00'): 8226,
Timestamp('2014-11-21 00:00:00'): 9741,
Timestamp('2014-12-01 00:00:00'): 26237,
Timestamp('2014-12-02 00:00:00'): 12146,
Timestamp('2014-12-03 00:00:00'): 12910,
Timestamp('2014-12-04 00:00:00'): 25820,
Timestamp('2014-12-05 00:00:00'): 29323,
Timestamp('2014-12-06 00:00:00'): 17294,
Timestamp('2014-12-07 00:00:00'): 15219,
Timestamp('2014-12-08 00:00:00'): 26174,
Timestamp('2014-12-09 00:00:00'): 27112,
Timestamp('2014-12-10 00:00:00'): 27131,
Timestamp('2014-12-11 00:00:00'): 28268,
Timestamp('2014-12-12 00:00:00'): 34059,
Timestamp('2014-12-13 00:00:00'): 39162,
Timestamp('2014-12-14 00:00:00'): 38314,
Timestamp('2014-12-15 00:00:00'): 19807,
Timestamp('2014-12-16 00:00:00'): 20606,
Timestamp('2014-12-17 00:00:00'): 21552,
Timestamp('2014-12-18 00:00:00'): 36499,
Timestamp('2014-12-19 00:00:00'): 42163,
Timestamp('2014-12-20 00:00:00'): 30301},
'day': {Timestamp('2014-11-12 00:00:00'): 12,
Timestamp('2014-11-13 00:00:00'): 13,
Timestamp('2014-11-14 00:00:00'): 14,
Timestamp('2014-11-15 00:00:00'): 15,
Timestamp('2014-11-16 00:00:00'): 16,
Timestamp('2014-11-17 00:00:00'): 17,
Timestamp('2014-11-18 00:00:00'): 18,
Timestamp('2014-11-19 00:00:00'): 19,
Timestamp('2014-11-20 00:00:00'): 20,
Timestamp('2014-11-21 00:00:00'): 21,
Timestamp('2014-12-01 00:00:00'): 1,
Timestamp('2014-12-02 00:00:00'): 2,
Timestamp('2014-12-03 00:00:00'): 3,
Timestamp('2014-12-04 00:00:00'): 4,
Timestamp('2014-12-05 00:00:00'): 5,
Timestamp('2014-12-06 00:00:00'): 6,
Timestamp('2014-12-07 00:00:00'): 7,
Timestamp('2014-12-08 00:00:00'): 8,
Timestamp('2014-12-09 00:00:00'): 9,
Timestamp('2014-12-10 00:00:00'): 10,
Timestamp('2014-12-11 00:00:00'): 11,
Timestamp('2014-12-12 00:00:00'): 12,
Timestamp('2014-12-13 00:00:00'): 13,
Timestamp('2014-12-14 00:00:00'): 14,
Timestamp('2014-12-15 00:00:00'): 15,
Timestamp('2014-12-16 00:00:00'): 16,
Timestamp('2014-12-17 00:00:00'): 17,
Timestamp('2014-12-18 00:00:00'): 18,
Timestamp('2014-12-19 00:00:00'): 19,
Timestamp('2014-12-20 00:00:00'): 20},
'month': {Timestamp('2014-11-12 00:00:00'): 11,
Timestamp('2014-11-13 00:00:00'): 11,
Timestamp('2014-11-14 00:00:00'): 11,
Timestamp('2014-11-15 00:00:00'): 11,
Timestamp('2014-11-16 00:00:00'): 11,
Timestamp('2014-11-17 00:00:00'): 11,
Timestamp('2014-11-18 00:00:00'): 11,
Timestamp('2014-11-19 00:00:00'): 11,
Timestamp('2014-11-20 00:00:00'): 11,
Timestamp('2014-11-21 00:00:00'): 11,
Timestamp('2014-12-01 00:00:00'): 12,
Timestamp('2014-12-02 00:00:00'): 12,
Timestamp('2014-12-03 00:00:00'): 12,
Timestamp('2014-12-04 00:00:00'): 12,
Timestamp('2014-12-05 00:00:00'): 12,
Timestamp('2014-12-06 00:00:00'): 12,
Timestamp('2014-12-07 00:00:00'): 12,
Timestamp('2014-12-08 00:00:00'): 12,
Timestamp('2014-12-09 00:00:00'): 12,
Timestamp('2014-12-10 00:00:00'): 12,
Timestamp('2014-12-11 00:00:00'): 12,
Timestamp('2014-12-12 00:00:00'): 12,
Timestamp('2014-12-13 00:00:00'): 12,
Timestamp('2014-12-14 00:00:00'): 12,
Timestamp('2014-12-15 00:00:00'): 12,
Timestamp('2014-12-16 00:00:00'): 12,
Timestamp('2014-12-17 00:00:00'): 12,
Timestamp('2014-12-18 00:00:00'): 12,
Timestamp('2014-12-19 00:00:00'): 12,
Timestamp('2014-12-20 00:00:00'): 12}}
我正在尝试为两只动物在 7 天内的数量绘制折线图;本质上,我的目标是将每只动物的时间序列显示在同一张图表上。
这是我的代码:
df['date'] = pd.to_datetime(df['date'], dayfirst=True, infer_datetime_format = True)
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')
grouped = df.groupby('animal')
for key, group in grouped:
data = group.groupby(lambda x: x.day)
data['count'].plot(label=key)
plt.legend()
plt.show()
我觉得我在这里遗漏了一个明显的 block ,但我不太明白。
编辑:我不太明白如何同时按月和日排序,所以在数据框中附加了一些数据。
最佳答案
创建一个day
列来存储所有天数:
df['day'] = df.index.day
因为我们希望日期沿 x
轴排序,所以也对列进行排序:
df = df.sort_values(by='day')
然后您可以按动物
分组并绘制每个子组:
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group.plot('day', 'count', label=key, ax=ax)
请注意,group.plot
调用 DataFrame.plot
,它允许您指定要用于 x-
和 y-
轴。相反,group['count'].plot
调用 Series.plot
,它假定 x 轴
是索引,y 轴
是系列的值。
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'animal': {12: 'dog', 44: 'dog', 47: 'dog', 69: 'rabbit', 76: 'rabbit', 84: 'dog', 122: 'rabbit', 162: 'rabbit', 177: 'rabbit', 190: 'rabbit', 217: 'dog', 219: 'dog', 220: 'dog', 226: 'rabbit'},
'count': {12: 34573, 44: 30676, 47: 41821, 69: 56880, 76: 73172, 84: 30581, 122: 52895, 162: 58430, 177: 57132, 190: 53903, 217: 32001, 219: 35776, 220: 31095, 226: 53809},
'date': {12: Timestamp('2014-12-29 00:00:00'), 44: Timestamp('2014-12-28 00:00:00'), 47: Timestamp('2014-12-31 00:00:00'), 69: Timestamp('2014-12-29 00:00:00'), 76: Timestamp('2014-12-31 00:00:00'), 84: Timestamp('2014-12-26 00:00:00'), 122: Timestamp('2014-12-25 00:00:00'), 162: Timestamp('2014-12-30 00:00:00'), 177: Timestamp('2014-12-27 00:00:00'), 190: Timestamp('2014-12-28 00:00:00'), 217: Timestamp('2014-12-27 00:00:00'), 219: Timestamp('2014-12-30 00:00:00'), 220: Timestamp('2014-12-25 00:00:00'), 226: Timestamp('2014-12-26 00:00:00')}})
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')
df['day'] = df.index.day
df = df.sort_values(by='day')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group.plot('day', 'count', label=key, ax=ax)
plt.legend(loc='best')
plt.show()
对于修订后的问题,如果您想要沿着 x 轴
的整个日期,那么使用 Series.plot
可能是最简单的方法(就像您在你的原始代码中做):
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'animal': ['dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'dog', 'dog', 'rabbit', 'rabbit', 'dog'], 'count': [6136, 14620, 16437, 17273, 15302, 15180, 7177, 16193, 8226, 9741, 26237, 12146, 12910, 25820, 29323, 17294, 15219, 26174, 27112, 27131, 28268, 34059, 39162, 38314, 19807, 20606, 21552, 36499, 42163, 30301], 'date': [Timestamp('2014-11-12 00:00:00'), Timestamp('2014-11-13 00:00:00'), Timestamp('2014-11-14 00:00:00'), Timestamp('2014-11-15 00:00:00'), Timestamp('2014-11-16 00:00:00'), Timestamp('2014-11-17 00:00:00'), Timestamp('2014-11-18 00:00:00'), Timestamp('2014-11-19 00:00:00'), Timestamp('2014-11-20 00:00:00'), Timestamp('2014-11-21 00:00:00'), Timestamp('2014-12-01 00:00:00'), Timestamp('2014-12-02 00:00:00'), Timestamp('2014-12-03 00:00:00'), Timestamp('2014-12-04 00:00:00'), Timestamp('2014-12-05 00:00:00'), Timestamp('2014-12-06 00:00:00'), Timestamp('2014-12-07 00:00:00'), Timestamp('2014-12-08 00:00:00'), Timestamp('2014-12-09 00:00:00'), Timestamp('2014-12-10 00:00:00'), Timestamp('2014-12-11 00:00:00'), Timestamp('2014-12-12 00:00:00'), Timestamp('2014-12-13 00:00:00'), Timestamp('2014-12-14 00:00:00'), Timestamp('2014-12-15 00:00:00'), Timestamp('2014-12-16 00:00:00'), Timestamp('2014-12-17 00:00:00'), Timestamp('2014-12-18 00:00:00'), Timestamp('2014-12-19 00:00:00'), Timestamp('2014-12-20 00:00:00')]})
df = df.set_index('date')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group['count'].plot(label=key, ax=ax)
plt.legend(loc='best')
plt.show()
关于python - 如何在 pandas + matplotlib 中的多个日期的一列中绘制多个因素的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34830921/