从创建我自己的h5数据库开始,我一直在构建一种机器学习算法来识别图像。我一直在关注this教程,它非常有用,但是我一直遇到一个主要错误-在代码的图像处理部分中使用OpenCV时,该程序无法保存已处理的图像,因为它不断翻转高度和我图像的宽度。当我尝试编译时,出现以下错误:
Traceback (most recent call last):
File "array+and+label+data.py", line 79, in <module>
hdf5_file["train_img"][i, ...] = img[None]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 631, in __setitem__
for fspace in selection.broadcast(mshape):
File "/Users/USER/miniconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 299, in broadcast
raise TypeError("Can't broadcast %s -> %s" % (target_shape, count))
TypeError: Can't broadcast (1, 240, 320, 3) -> (1, 320, 240, 3)
我的图片应该全部设置为320 x 240,但是您可以看到它以某种方式被翻转了。研究表明,这是因为OpenCV和NumPy使用不同的高度和宽度约定,但是我不确定如何在不修补我的OpenCV安装补丁的情况下协调此问题。关于如何解决此问题的任何想法?我是Python及其所有库的相对新手(尽管我很了解Java)!
先感谢您!
编辑:为上下文添加更多代码,这与本教程中“加载图像并保存它们”代码示例下的内容非常相似。
我的数组大小:
train_shape = (len(train_addrs), 320, 240, 3)
val_shape = (len(val_addrs), 320, 240, 3)
test_shape = (len(test_addrs), 320, 240, 3)
循环遍历图像地址并调整其大小的代码:
# Loop over training image addresses
for i in range(len(train_addrs)):
# print how many images are saved every 1000 images
if i % 1000 == 0 and i > 1:
print ('Train data: {}/{}'.format(i, len(train_addrs)))
# read an image and resize to (320, 240)
# cv2 load images as BGR, convert it to RGB
addr = train_addrs[i]
img = cv2.imread(addr)
img = cv2.resize(img, (320, 240), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# save the image and calculate the mean so far
hdf5_file["train_img"][i, ...] = img[None]
mean += img / float(len(train_labels))
最佳答案
Researching around has shown me that this is because OpenCV and NumPy use different conventions for height and width
不完全是。关于图像的唯一棘手的事情是2D数组/矩阵用(row,col)索引,这与我们可能用于图像的普通笛卡尔坐标(x,y)相反。因此,有时当您在OpenCV函数中指定点时,它希望它们在(x,y)坐标中-并且类似地,它希望在(w,h)中而不是(h, w)像数组一样。 OpenCV的
resize()
函数内部就是这种情况。您将其传递给(h,w),但实际上需要(w,h)。从docs for resize()
:dsize – output image size; if it equals zero, it is computed as:
dsize = Size(round(fx*src.cols), round(fy*src.rows))
Either
dsize
or bothfx
andfy
must be non-zero.
因此,您可以在此处看到列数是第一个维度(宽度),行数是第二个维度(高度)。
简单的解决方法是在
resize()
函数中将(h,w)交换为(w,h):img = cv2.resize(img, (240, 320), interpolation=cv2.INTER_CUBIC)
关于python - h5py翻转图像尺寸,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47615309/