我目前正在使用 numpy 和 scipy 在 python 中解决图像处理问题。简而言之,我有一个图像,我想对其应用许多局部收缩。我的原型(prototype)代码正在运行,最终的图像看起来很棒。然而,处理时间已经成为我们应用程序的严重瓶颈。你能帮助我加快图像处理代码的速度吗?
我尝试将我们的代码简化为下面的“卡通”版本。分析表明我大部分时间都花在插值上。有没有明显的方法可以加快执行速度?
import cProfile, pstats
import numpy
from scipy.ndimage import interpolation
def get_centered_subimage(
center_point, window_size, image):
x, y = numpy.round(center_point).astype(int)
xSl = slice(max(x-window_size-1, 0), x+window_size+2)
ySl = slice(max(y-window_size-1, 0), y+window_size+2)
subimage = image[xSl, ySl]
interpolation.shift(
subimage, shift=(x, y)-center_point, output=subimage)
return subimage[1:-1, 1:-1]
"""In real life, this is experimental data"""
im = numpy.zeros((1000, 1000), dtype=float)
"""In real life, this mask is a non-zero pattern"""
window_radius = 10
mask = numpy.zeros((2*window_radius+1, 2*window_radius+1), dtype=float)
"""The x, y coordinates in the output image"""
new_grid_x = numpy.linspace(0, im.shape[0]-1, 2*im.shape[0])
new_grid_y = numpy.linspace(0, im.shape[1]-1, 2*im.shape[1])
"""The grid we'll end up interpolating onto"""
grid_step_x = new_grid_x[1] - new_grid_x[0]
grid_step_y = new_grid_y[1] - new_grid_y[0]
subgrid_radius = numpy.floor(
(-1 + window_radius * 0.5 / grid_step_x,
-1 + window_radius * 0.5 / grid_step_y))
subgrid = (
window_radius + 2 * grid_step_x * numpy.arange(
-subgrid_radius[0], subgrid_radius[0] + 1),
window_radius + 2 * grid_step_y * numpy.arange(
-subgrid_radius[1], subgrid_radius[1] + 1))
subgrid_points = ((2*subgrid_radius[0] + 1) *
(2*subgrid_radius[1] + 1))
"""The coordinates of the set of spots we we want to contract. In real
life, this set is non-random:"""
numpy.random.seed(0)
num_points = 10000
center_points = numpy.random.random(2*num_points).reshape(num_points, 2)
center_points[:, 0] *= im.shape[0]
center_points[:, 1] *= im.shape[1]
"""The output image"""
final_image = numpy.zeros(
(new_grid_x.shape[0], new_grid_y.shape[0]), dtype=numpy.float)
def profile_me():
for m, cp in enumerate(center_points):
"""Take an image centered on each illumination point"""
spot_image = get_centered_subimage(
center_point=cp, window_size=window_radius, image=im)
if spot_image.shape != (2*window_radius+1, 2*window_radius+1):
continue #Skip to the next spot
"""Mask the image"""
masked_image = mask * spot_image
"""Resample the image"""
nearest_grid_index = numpy.round(
(cp - (new_grid_x[0], new_grid_y[0])) /
(grid_step_x, grid_step_y))
nearest_grid_point = (
(new_grid_x[0], new_grid_y[0]) +
(grid_step_x, grid_step_y) * nearest_grid_index)
new_coordinates = numpy.meshgrid(
subgrid[0] + 2 * (nearest_grid_point[0] - cp[0]),
subgrid[1] + 2 * (nearest_grid_point[1] - cp[1]))
resampled_image = interpolation.map_coordinates(
masked_image,
(new_coordinates[0].reshape(subgrid_points),
new_coordinates[1].reshape(subgrid_points))
).reshape(2*subgrid_radius[1]+1,
2*subgrid_radius[0]+1).T
"""Add the recentered image back to the scan grid"""
final_image[
nearest_grid_index[0]-subgrid_radius[0]:
nearest_grid_index[0]+subgrid_radius[0]+1,
nearest_grid_index[1]-subgrid_radius[1]:
nearest_grid_index[1]+subgrid_radius[1]+1,
] += resampled_image
cProfile.run('profile_me()', 'profile_results')
p = pstats.Stats('profile_results')
p.strip_dirs().sort_stats('cumulative').print_stats(10)
对代码作用的模糊解释:
我们从像素化的 2D 图像开始,以及图像中通常不落在整数网格上的一组任意 (x, y) 点。对于每个 (x, y) 点,我想将图像乘以一个恰好以该点为中心的小掩模。接下来,我们将 mask 区域收缩/扩展有限的量,然后最终将此处理后的子图像添加到最终图像中,最终图像的像素大小可能与原始图像不同。 (这不是我最好的解释。好吧)。
最佳答案
我很确定,正如您所说,大部分计算时间发生在 interpolate.map_coordinates(…)
中。 ,在 center_points
上的每次迭代都会调用一次。 ,这里 10,000 次。一般来说,使用 numpy/scipy 堆栈时,您希望在 native Numpy/Scipy 函数中(即在同质数据上的 C 循环中)对大型数组执行重复任务,而不是在 Python 中显式执行。
一种可能加速插值的策略,但也会增加内存使用量,是:
- 首先,获取 3 维数组 (
masked_image
) 中的所有子图像(此处名为window_radius x window_radius x center_points.size
) - 创建 ufunc (读一下,它很有用)使用 numpy.frompyfunc 包装必须在每个子图像上完成的工作,它应该返回另一个 3 维数组 (
subgrid_radius[0] x subgrid_radius[1] x center_points.size
)。简而言之,这创建了 python 函数的矢量化版本,可以在数组上按元素进行广播。 - 通过对第三维求和来构建最终图像。
希望这能让您更接近目标!
关于python - 在 python 中加速插值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8158953/