c - gcc 不会矢量化简单循环

标签 c gcc auto-vectorization

我正在尝试从 gcc auto-vectorize documentation 中矢量化示例 4 的简化版本.对于我的生活,我不知道该怎么做;

typedef int aint __attribute__ ((__aligned__(16)));
void foo1 (int n, aint * restrict px, aint *restrict qx) {

  /* feature: support for (aligned) pointer accesses.  */
  int *__restrict p = __builtin_assume_aligned (px, 16);
  int *__restrict q = __builtin_assume_aligned (qx, 16);

  while (n--){
    //*p++ += *q++; <- this is vectorized                                                                                                                                                                   
    p[n] += q[n]; // This isn't!                                                                                                                                                                            
  }
}

我正在运行 gcc 4.7.2 gcc -o apps/craft_dbsplit.o -c -Wall -g -ggdb -O3 -msse2 -funsafe-math-optimizations -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5 -funsafe-loop-optimizations -std =c99

它回复:

Analyzing loop at apps/craft_dbsplit.c:388

388: dependence distance  = 0.
388: dependence distance == 0 between *D.9363_14 and *D.9363_14
388: dependence distance  = 0.
388: accesses have the same alignment.
388: dependence distance modulo vf == 0 between *D.9363_14 and *D.9363_14
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: vect_model_store_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: Alignment of access forced using peeling.
388: Vectorizing an unaligned access.
388: vect_model_load_cost: aligned.
388: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
388: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
388: not vectorized: relevant stmt not supported: *D.9363_14 = D.9367_20;

apps/craft_dbsplit.c:382: note: vectorized 0 loops in function.

最佳答案

循环从高地址到低地址。您的 gcc 将 vector 操作视为从低地址运行到高地址,因此没有意识到它可以矢量化。您的“优化”使循环成为 while (n--),实际上阻止了更相关的优化。尝试

#include <stddef.h>

void foo1 (size_t n, int *restrict px, int const *restrict qx)
{
  int *restrict p = __builtin_assume_aligned(px, 16);
  int const *restrict q = __builtin_assume_aligned(qx, 16);
  size_t i = 0;
  while (i < n)
    {
      p[i] += q[i];
      i++;
    }
}

关于c - gcc 不会矢量化简单循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35555150/

相关文章:

c - 先学C再学Objective-C

CUDA - nvcc -G - 如果工作不正常

c++ - 使用 CMake 的实验性模块依赖扫描构建 C++ 模块

c++ - 为什么 p1007r0 std::assume_aligned 不需要结语?

gcc - GNU 中的 -ftree-vectorize 选项

c - 位值验证

c - 删除 HTTP header 信息

c - 如何将大十进制值转换为二进制

c++ - GCC编译器在查找头文件时不搜索子目录

c - 如何在 gcc 中为复数启用 SSE3 addsubps 自动矢量化?