python - pandas:将空数据帧写入HDF文件

标签 python pandas dataframe pytables hdf

有没有办法强制 pandas 将空 DataFrame 写入 HDF 文件?

import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx')
df2 = pd.read_hdf('temp.h5', 'xxx') 

输出:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
    return store.select(key, auto_close=auto_close, **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 740, in select
    return it.get_result()
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1518, in get_result
    results = self.func(self.start, self.stop, where)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 733, in func
    columns=columns)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2986, in read
    idx=i), start=_start, stop=_stop)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2575, in read_index
    _, index = self.read_index_node(getattr(self.group, key), **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2676, in read_index_node
    data = node[start:stop]
  File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 675, in __getitem__
    return self.read(start, stop, step)
  File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 811, in read
    listarr = self._read_array(start, stop, step)
  File "tables/hdf5extension.pyx", line 2106, in tables.hdf5extension.VLArray._read_array (tables/hdf5extension.c:24649)
ValueError: cannot set WRITEABLE flag to True of this array

使用format='table'写入:

import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx', format='table')
df2 = pd.read_hdf('temp.h5', 'xxx')

输出:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
    return store.select(key, auto_close=auto_close, **kwargs)
  File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 722, in select
    raise KeyError('No object named {key} in the file'.format(key=key))
KeyError: 'No object named xxx in the file'

Pandas 版本:0.24.2

感谢您的帮助!

最佳答案

固定格式将空DataFrame放入HDFStore中应该可以工作(也许您需要检查其他包的版本,例如tables):

# Versions
pd.__version__
tables.__version__

# DF
df = pd.DataFrame(columns=['x','y'])
df

# Dump in fixed format
with pd.HDFStore('temp.h5') as store:
    store.put('df', df, format='f')
    print('Read:')
    store.select('df')

>>> '0.24.2'
>>> '3.5.1'
>>>   x     y
>>>
>>> Read:
>>>   x     y

Pytable 确实禁止这样做(至少是这样),但对于 fixed pandas 有其 workaround .

但正如在同一 github 问题中所讨论的,也做出了一些努力来修复 table 的此行为。但看起来解决方案仍然“悬而未决”,因为在 march 的末尾就是如此。 .

关于python - pandas:将空数据帧写入HDF文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55287431/

相关文章:

r - 如何查找一行中第一个非空值的列名

python - 合并两个数据框并保留唯一列

python - PyCairo Pip 在 Windows 10 上的 python 3.8 中安装失败

python - docker compose Django nginx

python - 线程中的 SSL 错误与 python 中的简单发布请求

python - 如何从每行的字符串中提取年份并用这些年份生成新行

python - 更改时间序列数据框中的日期 Python

python - 分配到项目大小 > 1 的 Python 3.x 缓冲区

Python:如何删除每个ID只有一个值的所有行?

python - 仅取 pandas 中两个连续值的平均值