python - 如何将 pyarrow timestamp dtype 转换为 time64 类型?

标签 python pyarrow apache-arrow

我正在尝试将 pyarrow 时间戳类型转换为 time64 类型。但它显示转换错误。

import pyarrow as pa
from datetime import datetime

dt = datetime.now()
table = pa.Table.from_pydict({'ts': pa.array([dt, dt])})
new_schema = table.schema.set(0, pa.field('ts', pa.time64('us')))
table.schema
# ts: timestamp[us]
new_schema
# ts: time64[us]

table.cast(new_schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/table.pxi", line 1329, in pyarrow.lib.Table.cast
  File "pyarrow/table.pxi", line 277, in pyarrow.lib.ChunkedArray.cast
  File "/home/inspiron/.virtualenvs/par/lib/python3.7/site-packages/pyarrow/compute.py", line 243, in cast
    return call_function("cast", [arr], options)
  File "pyarrow/_compute.pyx", line 446, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 275, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from timestamp[us] to time64 using function cast_time64

有什么办法可以让这个选角成为可能吗?

最佳答案

time64[us] 是一天中的某个时间。它表示自午夜以来的微秒数。它不与任何特定日期绑定(bind),并且无法转换为时间戳。

Arrow 文档有点稀疏,但 parquet docs更好地解释一下:

TIME

TIME is used for a logical time type without a date with millisecond or microsecond precision. The type has two type parameters: UTC adjustment (true or false) and unit (MILLIS or MICROS, NANOS).

TIME with unit MILLIS is used for millisecond precision. It must annotate an int32 that stores the number of milliseconds after midnight.

TIME with unit MICROS is used for microsecond precision. It must annotate an int64 that stores the number of microseconds after midnight.

TIME with unit NANOS is used for nanosecond precision. It must annotate an int64 that stores the number of nanoseconds after midnight.

The sort order used for TIME is signed.

关于python - 如何将 pyarrow timestamp dtype 转换为 time64 类型?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68766837/

相关文章:

python:通过ID检查线程的状态

python - CNTK create_trainer 方程

python - Django : What kind of querysets should i look for when deciding on model indexes?

docker - Kubernetes 中 Docker 容器中的内存映射文件是否与 Linux 中的常规进程一样工作?

python - 为什么我的代码不能工作?

hdfs - 使用 PyArrow 从 HDFS 读取 Parquet 文件

python - 使用补丁模拟 pyarrow.parquet

julia - 如何将分区的 Apache Arrow 或 Parquet 文件读入/写出 Julia

apache-spark - PySpark:带有标量 Pandas UDF 的无效返回类型