在以时间戳作为复合索引的 DataFrame 中,我们如何对满足这两个条件的行进行切片:
start
早于2014-09-26 12:00:00
停止
在2014-09-26 13:00:00
之后
df 的索引 start
和 stop
原本是普通列,设置为索引是为了方便切片,就像只有 1 个索引的原因:
jobId
start stop
2014-09-26 09:45:01 2014-09-26 09:48:02 c35877
2014-09-26 11:23:46 2014-09-26 11:53:13 8f0f64
2014-09-26 11:46:50 2014-09-26 11:58:07 52ac37
2014-09-26 12:10:15 2014-09-26 12:23:23 47dfc2
2014-09-26 12:13:52 2014-09-26 12:18:31 c35877
2014-09-26 12:30:47 2014-09-26 12:39:49 8f0f64
2014-09-26 12:37:53 2014-09-26 12:45:48 96b20b
2014-09-26 12:45:35 2014-09-26 12:50:22 8f0f64
2014-09-26 12:49:26 2014-09-26 13:03:59 285618
2014-09-26 13:04:42 2014-09-26 13:15:23 2c74a9
2014-09-26 13:20:01 2014-09-26 13:27:46 8f0f64
最佳答案
使用查询方法(最近版本):
import pandas
from io import StringIO
rawdata = StringIO("""start,stop,jobID
2014-09-26 09:45:01,2014-09-26 09:48:02,c35877
2014-09-26 11:23:46,2014-09-26 11:53:13,8f0f64
2014-09-26 11:46:50,2014-09-26 11:58:07,52ac37
2014-09-26 12:10:15,2014-09-26 12:23:23,47dfc2
2014-09-26 12:13:52,2014-09-26 12:18:31,c35877
2014-09-26 12:30:47,2014-09-26 12:39:49,8f0f64
2014-09-26 12:37:53,2014-09-26 12:45:48,96b20b
2014-09-26 12:45:35,2014-09-26 12:50:22,8f0f64
2014-09-26 12:49:26,2014-09-26 13:03:59,285618
2014-09-26 13:04:42,2014-09-26 13:15:23,2c74a9
2014-09-26 13:20:01,2014-09-26 13:27:46,8f0f64
""")
df = pandas.read_csv(rawdata, parse_dates=True, index_col=['start', 'stop'])
df.query("start > '2014-09-26 12:00:00' and stop < '2014-09-26 13:00:00'")
打印:
jobID
start stop
2014-09-26 12:10:15 2014-09-26 12:23:23 47dfc2
2014-09-26 12:13:52 2014-09-26 12:18:31 c35877
2014-09-26 12:30:47 2014-09-26 12:39:49 8f0f64
2014-09-26 12:37:53 2014-09-26 12:45:48 96b20b
2014-09-26 12:45:35 2014-09-26 12:50:22 8f0f64
关于python - 使用 2 个复合索引对 Pandas DataFrame 进行切片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26475406/