python-2.7 - 如何快速选择日期之间的行pandas dataframe

标签 python-2.7 pandas

我想知道在索引内两个日期之间选择行的速度方面最有效的方法是什么。例如

>>> import pandas as pd
>>> index = pd.date_range('2018-01-01', '2030-01-02', freq='BM')
>>> df = pd.DataFrame(np.zeros((len(index), 1)), index=index)
>>> df.head()
              0
2018-01-31  0.0
2018-02-28  0.0
2018-03-30  0.0
2018-04-30  0.0
2018-05-31  0.0

然后一种选择之间所有行的方法,例如2018-05-30 2027-07-03

>>> df.loc[(df.index >= '2018-05-30') & (df.index <= '2027-07-03')]

在我的应用程序中,我不预先知道值 2018-05-30 2027-07-03。实现所需选择的最快方法是什么?

最佳答案

您可以使用 truncate :

print (df.truncate(before='2018-05-30', after='2027-07-03'))

print (df.loc['2018-05-30':'2027-07-03'])

print (df.loc[(df.index >= '2018-05-30') & (df.index <= '2027-07-03')])

时间:

In [366]: %timeit (df.loc['2018-05-30':'2027-07-03'])
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.43 ms per loop

In [367]: %timeit (df.loc[(df.index >= '2018-05-30') & (df.index <= '2027-07-03')])
The slowest run took 4.97 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 502 µs per loop

In [368]: %timeit (df.truncate(before='2018-05-30', after='2027-07-03'))
The slowest run took 4.98 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 450 µs per loop

如果您稍微更改条件但不包含最后一个值(如果存在)- <=更改为< :

In [372]: %timeit (df.loc[(df.index >= '2018-05-31') & (df.index < '2027-05-31')])
The slowest run took 4.81 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 520 µs per loop

In [373]: %timeit (df.iloc[df.index.searchsorted('2018-05-31'): df.index.searchsorted('2027-05-31')])
10000 loops, best of 3: 136 µs per loop

关于python-2.7 - 如何快速选择日期之间的行pandas dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48462118/

相关文章:

python - 如何在pytest中导入函数

python - 列表字典的笛卡尔积 - 带排序

python - TKinter 在小部件周围留下边框

python - 将 DataFrame 转换为字典

python - Pandas 卡住列名

macos - 让pygame在MacOS上显示除空白屏幕以外的任何内容的问题

python - 两个 DataFrames (Python/Pandas) 中每一行和每一列的区别

python - 在 pandas 的 groupby 命令后与 seaborn 一起绘制

python - 严格高于原始曲线的平滑算法

python - Pandas 分组并根据条件添加列数据