我正在尝试解决一个非常简单的问题,但遇到了困难。 我有一个基于简单数据框的 DateTimeIndex,如下所示:
df=pd.DataFrame(
index=pd.date_range(
start='2017-01-01',
end='2017-03-04', closed=None),
data=np.arange(63), columns=['val']).rename_axis(index='date')
In [179]: df
Out[179]:
val
date
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-01-04 3
2017-01-05 4
... ...
2017-02-28 58
2017-03-01 59
2017-03-02 60
2017-03-03 61
2017-03-04 62
[63 rows x 1 columns]
我想按周、半月、月等时间段汇总值。 所以我尝试了:
In [180]: df.to_period('W').groupby('date').sum()
Out[180]:
val
date
2016-12-26/2017-01-01 0
2017-01-02/2017-01-08 28
2017-01-09/2017-01-15 77
2017-01-16/2017-01-22 126
2017-01-23/2017-01-29 175
2017-01-30/2017-02-05 224
2017-02-06/2017-02-12 273
2017-02-13/2017-02-19 322
2017-02-20/2017-02-26 371
2017-02-27/2017-03-05 357
这适用于像 Y、M、D、W、T、S、L、U、N 这样的偏移别名。 但对于 SM、SMS 和此处列出的其他人失败:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
它引发了一个 ValueError 异常:
In [181]: df.to_period('SMS').groupby('date').sum() --------------------------------------------------------------------------- KeyError Traceback (most recent call last) pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies._period_str_to_code() KeyError: 'SMS-15' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-181-6779559a0596> in <module> ----> 1 df.to_period('SMS').groupby('date').sum() ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/frame.py in to_period(self, freq, axis, copy) 8350 axis = self._get_axis_number(axis) 8351 if axis == 0: -> 8352 new_data.set_axis(1, self.index.to_period(freq=freq)) 8353 elif axis == 1: 8354 new_data.set_axis(0, self.columns.to_period(freq=freq)) ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/accessor.py in f(self, *args, **kwargs) 91 def _create_delegator_method(name): 92 def f(self, *args, **kwargs): ---> 93 return self._delegate_method(name, *args, **kwargs) 94 95 f.__name__ = name ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in _delegate_method(self, name, *args, **kwargs) 811 812 def _delegate_method(self, name, *args, **kwargs): --> 813 result = operator.methodcaller(name, *args, **kwargs)(self._data) 814 if name not in self._raw_methods: 815 result = Index(result, name=self.name) ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in to_period(self, freq) 1280 freq = get_period_alias(freq) 1281 -> 1282 return PeriodArray._from_datetime64(self._data, freq, tz=self.tz) 1283 1284 def to_perioddelta(self, freq): ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/period.py in _from_datetime64(cls, data, freq, tz) 273 PeriodArray[freq] 274 """ --> 275 data, freq = dt64arr_to_periodarr(data, freq, tz) 276 return cls(data, freq=freq) 277 ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/period.py in dt64arr_to_periodarr(data, freq, tz) 914 data = data._values 915 --> 916 base, mult = libfrequencies.get_freq_code(freq) 917 return libperiod.dt64arr_to_periodarr(data.view("i8"), base, tz), freq 918 pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies._period_str_to_code() ValueError: Invalid frequency: SMS-15
我正在使用 python 3.6.5,pandas 版本“0.25.1”
最佳答案
使用DataFrame.resample
这里:
print (df.resample('W').sum())
val
date
2017-01-01 0
2017-01-08 28
2017-01-15 77
2017-01-22 126
2017-01-29 175
2017-02-05 224
2017-02-12 273
2017-02-19 322
2017-02-26 371
2017-03-05 357
print (df.resample('SM').sum())
val
date
2016-12-31 91
2017-01-15 344
2017-01-31 555
2017-02-15 663
2017-02-28 300
print (df.resample('SMS').sum())
val
date
2017-01-01 91
2017-01-15 374
2017-02-01 525
2017-02-15 721
2017-03-01 242
groupby
和 Grouper
的替代方案:
print (df.groupby(pd.Grouper(freq='W')).sum())
print (df.groupby(pd.Grouper(freq='SM')).sum())
print (df.groupby(pd.Grouper(freq='SMS')).sum())
关于python - DateTimeIndex.to_period 为许多偏移量别名引发 ValueError 异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58064824/