c - 如何使以下双线性插值代码更高效？

下面的代码是使用双线性插值放大图片。

slow_rescale函数中哪里可以修改，使其更高效？

我希望从计算机组织原理的角度对其进行修改。

期待您的回答!

谢谢!

unsigned char *slow_rescale(unsigned char *src, int src_x, int src_y, int dest_x, int dest_y)
{
 double step_x,step_y;          // Step increase as per instructions above
 unsigned char R1,R2,R3,R4;     // Colours at the four neighbours
 unsigned char G1,G2,G3,G4;
 unsigned char B1,B2,B3,B4;
 double RT1, GT1, BT1;          // Interpolated colours at T1 and T2
 double RT2, GT2, BT2;
 unsigned char R,G,B;           // Final colour at a destination pixel
 unsigned char *dst;            // Destination image - must be allocated here! 
 int x,y;               // Coordinates on destination image
 double fx,fy;              // Corresponding coordinates on source image
 double dx,dy;              // Fractional component of source image    coordinates

 dst=(unsigned char *)calloc(dest_x*dest_y*3,sizeof(unsigned char));   // Allocate and clear   destination image
 if (!dst) return(NULL);                           // Unable to allocate image

 step_x=(double)(src_x-1)/(double)(dest_x-1);
 step_y=(double)(src_y-1)/(double)(dest_y-1);

 for (x=0;x<dest_x;x++)         // Loop over destination image
  for (y=0;y<dest_y;y++)
  {
    fx=x*step_x;
    fy=y*step_y;
    dx=fx-(int)fx;
    dy=fy-(int)fy;   
    getPixel(src,floor(fx),floor(fy),src_x,&R1,&G1,&B1);    // get N1 colours
    getPixel(src,ceil(fx),floor(fy),src_x,&R2,&G2,&B2); // get N2 colours
    getPixel(src,floor(fx),ceil(fy),src_x,&R3,&G3,&B3); // get N3 colours
    getPixel(src,ceil(fx),ceil(fy),src_x,&R4,&G4,&B4);  // get N4 colours
   // Interpolate to get T1 and T2 colours
   RT1=(dx*R2)+(1-dx)*R1;
   GT1=(dx*G2)+(1-dx)*G1;
   BT1=(dx*B2)+(1-dx)*B1;
   RT2=(dx*R4)+(1-dx)*R3;
   GT2=(dx*G4)+(1-dx)*G3;
   BT2=(dx*B4)+(1-dx)*B3;
   // Obtain final colour by interpolating between T1 and T2
   R=(unsigned char)((dy*RT2)+((1-dy)*RT1));
   G=(unsigned char)((dy*GT2)+((1-dy)*GT1));
   B=(unsigned char)((dy*BT2)+((1-dy)*BT1));
  // Store the final colour
  setPixel(dst,x,y,dest_x,R,G,B);
 }
  return(dst);
}
void getPixel(unsigned char *image, int x, int y, int sx, unsigned char *R, unsigned char *G, unsigned char *B)
{
 // Get the colour at pixel x,y in the image and return it using the provided RGB pointers
 // Requires the image size along the x direction!
 *(R)=*(image+((x+(y*sx))*3)+0);
 *(G)=*(image+((x+(y*sx))*3)+1);
 *(B)=*(image+((x+(y*sx))*3)+2);
}

void setPixel(unsigned char *image, int x, int y, int sx, unsigned char R, unsigned char G, unsigned char B)
{
 // Set the colour of the pixel at x,y in the image to the specified R,G,B
 // Requires the image size along the x direction!
 *(image+((x+(y*sx))*3)+0)=R;
 *(image+((x+(y*sx))*3)+1)=G;
 *(image+((x+(y*sx))*3)+2)=B;
}

最佳答案

我一直担心图像处理性能。以下是一些需要牢记的明显注意事项:

数值精度:

我从您的代码中跳出来的第一件事是对步长、颜色值和坐标使用 double 值。对于这些数量，您真的需要那种精度吗？如果不是，您可以做一些分析来检查代码在使用定点或 float 时的性能。

请记住，这是一个依赖于硬件的问题，性能可能会或可能不会成为问题，具体取决于您的硬件是否实现了 double 、仅 float 或两者都不实现(然后两者都在软件中实现)。这方面的讨论还包括内存对齐，合并内存访问等。当然这些话题涉及“计算机组织原理”，还有更多discussion on this topic is here .

循环展开:

您是否也考虑过手册 loop unrolling ？这可能有帮助，也可能没有帮助，因为您的编译器可能已经尝试利用此类优化，但它至少值得考虑，因为您对潜在的大数组大小有一个双循环。

数字冗余:

在您的 getPixel() 函数中，您还为每个 RGB 分量计算 image+((x+(y*sx))*3，这似乎没有改变，为什么不只是在函数开始时计算一次这个数量？

vector 处理:

如果不首先考虑是否可以利用 vector 处理，就很难考虑优化这样的代码。您是否可以访问矢量化指令集，例如 SSE？

并行处理:

大多数系统都安装了 OpenMP。如果是这样，您可能会考虑重构代码以利用处理器的多核功能。使用 pragma's 实现起来非常简单，这当然值得一试。

编译器标志:

此外，尽管您没有直接提及，但编译标志会影响 C 代码的性能。例如，如果使用 gcc，您可以使用以下方法比较性能差异:

gcc -std=c99 -o main main.c

对比

gcc -std=c99 -O3 -o main main.c

关于c - 如何使以下双线性插值代码更高效？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20870519/