我有一个从 MongoDB 读取的数据的列表
。数据的一个子集可以在 this gist 中找到。 .我正在从此列表中创建一个 DataFrame,使用 Date 字段创建一个 DatetimeIndex .这些日期最初是在我本地的时区记录的,但在 Mongo 中它们没有附加时区信息,所以我按照建议更正了 DST here .
from datetime import datetime
from dateutil import tz
# data is the list from the gist
dates = [x['Date'] for x in data]
idx = pd.DatetimeIndex(dates, freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz='Europe/Dublin')
idx = idx.normalize()
frame = DataFrame(data, index=idx)
frame = frame.drop('Date', 1)
一切似乎都很好,我的框架看起来像这样
Events ID
2008-03-31 00:00:00+01:00 0.0 116927302
2008-03-30 00:00:00+00:00 2401.0 116927302
2008-03-31 00:00:00+01:00 0.0 116927307
2008-03-30 00:00:00+00:00 0.0 116927307
2008-03-31 00:00:00+01:00 0.0 121126919
2008-03-30 00:00:00+00:00 1019.0 121126919
2008-03-30 00:00:00+00:00 0.0 121126922
2008-03-31 00:00:00+01:00 0.0 121126922
2008-03-30 00:00:00+00:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 121127133
2008-03-31 00:00:00+01:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677370
2008-03-30 00:00:00+00:00 0.0 131677416
2008-03-31 00:00:00+01:00 0.0 131677416
现在我想同时使用原始 DatetimeIndex 和 ID 列来创建 MultiIndex如图here . 但是,当我尝试这样做时,我收到了最初创建 DatetimeIndex 时未引发的错误
frame.set_index([frame.ID, idx])
NonExistentTimeError: 2008-03-30 01:00:00
如果我只是在没有 MultiIndex 的情况下执行 frame.set_index(idx)
,它不会引发任何错误
版本
- python 2.7.11
- Pandas 0.18.0
最佳答案
您首先需要 sort_index
,然后将列 ID
附加到 index
:
frame = frame.sort_index()
frame.set_index('ID', append=True, inplace=True)
print (frame)
Events
ID
2008-03-30 00:00:00+00:00 168445814 0.0
168445633 0.0
168445653 0.0
245514429 0.0
168445739 0.0
168445810 0.0
332955940 0.0
168445875 0.0
168445628 0.0
217596128 1779.0
177336685 0.0
180799848 0.0
215797757 0.0
180800351 1657.0
183192871 0.0
...
...
如果需要其他级别排序,请使用 DataFrame.swaplevel
:
frame = frame.sort_index()
frame.set_index('ID', append=True, inplace=True)
frame = frame.swaplevel(0,1)
print (frame)
Events
ID
168445814 2008-03-30 00:00:00+00:00 0.0
168445633 2008-03-30 00:00:00+00:00 0.0
168445653 2008-03-30 00:00:00+00:00 0.0
245514429 2008-03-30 00:00:00+00:00 0.0
168445739 2008-03-30 00:00:00+00:00 0.0
168445810 2008-03-30 00:00:00+00:00 0.0
332955940 2008-03-30 00:00:00+00:00 0.0
168445875 2008-03-30 00:00:00+00:00 0.0
168445628 2008-03-30 00:00:00+00:00 0.0
217596128 2008-03-30 00:00:00+00:00 1779.0
177336685 2008-03-30 00:00:00+00:00 0.0
180799848 2008-03-30 00:00:00+00:00 0.0
215797757 2008-03-30 00:00:00+00:00 0.0
180800351 2008-03-30 00:00:00+00:00 1657.0
183192871 2008-03-30 00:00:00+00:00 0.0
186439064 2008-03-30 00:00:00+00:00 0.0
199856024 2008-03-30 00:00:00+00:00 0.0
...
...
如果需要将列复制到 index
使用 set_index(frame.ID, ...
:
frame = frame.sort_index()
frame.set_index(frame.ID, append=True, inplace=True)
frame = frame.swaplevel(0,1)
print (frame)
Events ID
ID
168445814 2008-03-30 00:00:00+00:00 0.0 168445814
168445633 2008-03-30 00:00:00+00:00 0.0 168445633
168445653 2008-03-30 00:00:00+00:00 0.0 168445653
245514429 2008-03-30 00:00:00+00:00 0.0 245514429
168445739 2008-03-30 00:00:00+00:00 0.0 168445739
168445810 2008-03-30 00:00:00+00:00 0.0 168445810
332955940 2008-03-30 00:00:00+00:00 0.0 332955940
168445875 2008-03-30 00:00:00+00:00 0.0 168445875
168445628 2008-03-30 00:00:00+00:00 0.0 168445628
217596128 2008-03-30 00:00:00+00:00 1779.0 217596128
177336685 2008-03-30 00:00:00+00:00 0.0 177336685
180799848 2008-03-30 00:00:00+00:00 0.0 180799848
215797757 2008-03-30 00:00:00+00:00 0.0 215797757
180800351 2008-03-30 00:00:00+00:00 1657.0 180800351
183192871 2008-03-30 00:00:00+00:00 0.0 183192871
186439064 2008-03-30 00:00:00+00:00 0.0 186439064
...
...
关于python - 仅在创建 MultiIndex 时 Pandas DatetimeIndex NonExistentTimeError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38502474/