python - 如何使用 Pandas Timestamp 折叠参数?

标签 python pandas datetime timestamp fold

在处理时区转换和 dst 影响时,我很难弄清楚 Timestamp 构造函数的 fold 参数的 Pandas 实现。 documentation提及:

Due to daylight saving time, one wall clock time can occur twice when shifting from summer to winter time; fold describes whether the datetime-like corresponds to the first (0) or the second time (1) the wall clock hits the ambiguous time.

到目前为止并不奇怪,但是当我运行以下代码时:

import pandas as pd
from datetime import datetime

pre_fold = pd.Timestamp(datetime(2022,10,30,1,30,0), tz="CET")
in_fold_fold0 = pd.Timestamp(datetime(2022,10,30,2,30,0), tz="CET")
in_fold_fold1 = pd.Timestamp(datetime(2022,10,30,2,30,0), tz="CET", fold=1)
post_fold = pd.Timestamp(datetime(2022,10,30,3,30,0), tz="CET")

print(f"fold0: {in_fold_fold0.fold}")
print(f"fold1: {in_fold_fold1.fold}")

print(f"Pre CET fold:       {pre_fold}  ->  UTC {pre_fold.tz_convert(tz='UTC')}")
print(f"In CET fold, fold0: {in_fold_fold0}  ->  UTC {in_fold_fold0.tz_convert(tz='UTC')}")
print(f"In CET fold, fold1: {in_fold_fold1}  ->  UTC {in_fold_fold1.tz_convert(tz='UTC')}")
print(f"Post CET fold:      {post_fold}  ->  UTC {post_fold.tz_convert(tz='UTC')}")

输出不符合预期:

fold0: 0
fold1: 1
Pre CET fold:       2022-10-30 01:30:00+02:00  ->  UTC 2022-10-29 23:30:00+00:00
In CET fold, fold0: 2022-10-30 02:30:00+01:00  ->  UTC 2022-10-30 01:30:00+00:00
In CET fold, fold1: 2022-10-30 02:30:00+01:00  ->  UTC 2022-10-30 01:30:00+00:00
Post CET fold:      2022-10-30 03:30:00+01:00  ->  UTC 2022-10-30 02:30:00+00:00

第 4 行应该是:

In CET fold, fold0: 2022-10-30 02:30:00+02:00  ->  UTC 2022-10-30 00:30:00+00:00

我在这里错过了什么?

PS:使用 python 的 datetime 对象会产生预期的输出:

from datetime import datetime
from dateutil import tz

dt_pre_fold = datetime(2022,10,30,1,30,0, tzinfo=tz.gettz("CET"))
dt_in_fold_fold0 = datetime(2022,10,30,2,30,0, tzinfo=tz.gettz("CET"))
dt_in_fold_fold1 = datetime(2022,10,30,2,30,0, tzinfo=tz.gettz("CET"), fold=1)
dt_post_fold = datetime(2022,10,30,3,30,0, tzinfo=tz.gettz("CET"))

print(f"Pre CET fold:       {dt_pre_fold}  ->  UTC {dt_pre_fold.astimezone(tz.gettz('UTC'))}")
print(f"In CET fold, fold0: {dt_in_fold_fold0}  ->  UTC {dt_in_fold_fold0.astimezone(tz.gettz('UTC'))}")
print(f"In CET fold, fold1: {dt_in_fold_fold1}  ->  UTC {dt_in_fold_fold1.astimezone(tz.gettz('UTC'))}")
print(f"Post CET fold:      {dt_post_fold}  ->  UTC {dt_post_fold.astimezone(tz.gettz('UTC'))}")

输出:

Pre CET fold:       2022-10-30 01:30:00+02:00  ->  UTC 2022-10-29 23:30:00+00:00
In CET fold, fold0: 2022-10-30 02:30:00+02:00  ->  UTC 2022-10-30 00:30:00+00:00
In CET fold, fold1: 2022-10-30 02:30:00+01:00  ->  UTC 2022-10-30 01:30:00+00:00
Post CET fold:      2022-10-30 03:30:00+01:00  ->  UTC 2022-10-30 02:30:00+00:00

最佳答案

似乎没有正确指定时区信息:

# using your code
x = pd.Timestamp(datetime(2022,10,30,2,30,0), fold = 0, tz="CET")
x.tz_convert('UTC')
# Timestamp('2022-10-30 01:30:00+0000', tz='UTC')

但是如果你使用 from dateutil import tz

x = pd.Timestamp(datetime(2022,10,30,2,30,0), fold = 0, tz=tz.gettz("CET"))
x.tz_convert('UTC')
# Timestamp('2022-10-30 00:30:00+0000', tz='UTC')

它返回正确的值

关于python - 如何使用 Pandas Timestamp 折叠参数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71652764/

相关文章:

python - 向量中元素之间的最小距离

python - SqlAlchemy 和多处理

javascript - 转换 JSON 变量中的日期格式

将 json 解析为 avro 模式 : avro. schema.SchemaParseException 时出现 Python 异常:没有 "type"属性

python - 如何拆分包含字符串的列

python - 使用另一个数据帧替换数据帧中的空值

python - pandas DataFrame 名为 True 和 False 的列工作得很好

datetime - 当数据依赖于日期时间时在数据库中保存日期时间和时区信息的最佳实践

.net - Windows XP 上的 DateTime.ToLocalTime

python - 迭代多个Excel文件,使用python将特定单元格保存到数据框中