在编译我的源代码(启用自动矢量化和自动并行化的基本矩阵乘法)时,我在控制台中收到以下警告:
C5002: loop not vectorized due to reason '1200'
C5012: loop not parallelized due to reason'1000'
我已通读 this MSDN 提供的资源,其中指出:
Reason code 1200: Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.
Reason code 1000: The compiler detected a data dependency in the loop body.
我不确定我的循环中是什么导致了问题。这是我的源代码的相关部分。
// int** A, int** B, int** result, const int dimension
for (int i = 0; i < dimension; ++i) {
for (int j = 0; j < dimension; ++j) {
for (int k = 0; k < dimension; ++k) {
result[i][j] = result[i][j] + A[i][k] * B[k][j];
}
}
}
如有任何见解,我们将不胜感激。
最佳答案
循环携带依赖于result[i][j]
。
您的问题的解决方案是在对结果求和并在最内层循环之外进行更新时使用临时变量,如下所示:
for (int i = 0; i < dimension; ++i) {
for (int j = 0; j < dimension; ++j) {
auto tmp = 0;
for (int k = 0; k < dimension; ++k) {
tmp += A[i][k] * B[k][j];
}
result[i][j] = tmp;
}
}
这将消除依赖性(因为 result[i][j]
有更多的写后读,应该有助于矢量化器做得更好。
关于C++ 自动矢量化矩阵乘法循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33965319/