python - 如何使用 python 将 Hdf5 文件部分复制到保持相同结构的新文件中？

我有一个看起来像这样的大 hdf5 文件:

A/B/dataset1, dataset2
A/C/dataset1, dataset2
A/D/dataset1, dataset2
A/E/dataset1, dataset2

...

我只想创建一个新文件: A/B/数据集 1，数据集 2 A/C/数据集1，数据集2

python中最简单的方法是什么？

我做到了:

fs = h5py.File('source.h5', 'r')
fd = h5py.File('dest.h5', 'w')
fs.copy('group B', fd)

问题是我得到了 dest.h5:

B/dataset1, dataset2

而且我遗漏了树状结构的一部分。

最佳答案

fs.copy('A/B', fd) 不会将路径 /A/B/ 复制到 fd 中，它只复制 B 组(如您所知!)。所以你首先需要创建路径的其余部分:

fd.create_group('A')
fs.copy('A/B', fd['/A'])

或者，如果您将经常使用该组:

fd_A = fd.create_group('A')
fs.copy('A/B', fd_A)

这会将 B 组从 fs['/A/B'] 复制到 fd['/A']:

In [1]: fd['A/B'].keys()
Out[1]: [u'dataset1', u'dataset2']

这是一种自动执行此操作的方法:

# Get the name of the parent for the group we want to copy
group_path = fs['/A/B'].parent.name

# Check that this group exists in the destination file; if it doesn't, create it
# This will create the parents too, if they don't exist
group_id = fd.require_group(group_path)

# Copy fs:/A/B/ to fd:/A/G
fs.copy('/A/B', group_id, name="G")

print(fd['/A/G'].keys())
# [u'dataset1', u'dataset2']

关于python - 如何使用 python 将 Hdf5 文件部分复制到保持相同结构的新文件中？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24510240/

上一篇：python - python 的 multiprocessing 和 concurrent.futures 有什么区别？

下一篇：python - 在 Python 中设置只读属性？

相关文章：

python - 使用 Dask 数组和/或 h5py 进行循环

python - 我无法使用 h5py 读回数据。 "unable to create group"

python - Keras适合生成器-ValueError : Failed to find data adapter that can handle input

python - 当 `tape.watch(x)` 在 TensorFlow 中已经是 `x` 时调用 `tf.Variable` 是否可以？

javascript - ElementClickInterceptedException : element click intercepted: Element is not clickable at point error clicking on Search button using Selenium Python

python - 使用 numpy/h5py 进行内存高效的 Benjamini-Hochberg FDR 校正

c++ - HDF5 存储不同大小的字符串属性

installation - pytables 安装失败

python - 使用函数重命名多个 pandas Dataframe 列名

python - 使用 astype 在 H5py 中创建对 HDF 数据集的引用