linker - CUDA:头文件中使用的 __device__ 函数上的 LNK2005 错误

我有一个设备在头文件中定义的函数。它在头文件中的原因是因为它被 使用。全局内核，它需要在头文件中，因为它是模板内核。

当此头文件包含在 2 个或更多 .cu 文件中时，我在链接期间收到 LNK2005 错误:

FooDevice.cu.obj : error LNK2005: "int __cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) already defined in Main.cu.obj

为什么会导致这个错误？如何解决？

这是产生上述错误的示例代码:

FooDevice.h:

#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H

__device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

__global__ void fooKernel2( const int* inArr, int num, int* outArr );

#endif // FOO_DEVICE_H

FooDevice.cu:

#include "FooDevice.h"

// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

主.cu:

#include "FooDevice.h"

int main()
{
    int num             = 10;
    int* dInArr         = NULL;
    int* dOutArr        = NULL;
    const int arrSize   = num * sizeof( *dInArr );

    cudaMalloc( &dInArr, arrSize );
    cudaMalloc( &dOutArr, arrSize );

    // Using template kernel
    fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );

    return 0;
}

最佳答案

为什么会导致这个错误？

因为你已经在 FooDevice.cu 和 Main.cu 中包含了你的头文件，所以你现在有两个相同函数的副本，链接器会检测到这一点。

如何解决？

如果您在 foo.h 中定义了以下内容

template<typename T> __device__ T foo(T x)
{
    return x;
}

还有两个 .cu 文件，它们都包含 foo.h 并包含对它的调用，例如

int x = foo<int>(1);

然后你可以强制 foo() 内联:

template<typename T>
inline __device__ T foo(T x)
{
    return x;
}

并调用:

int x = foo<int>(1);

这将阻止它被多次声明。

Function templates are an exempt of One Defintion Rule and may be more than one definition of them in different translation unit's. Full function template specialization is not a template, rather an ordinary function, so you need to use inline keyword not to violate ODR if you want to put them in a header file included into several translation unit's.

取自 http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-generation-of-code.html

另见:http://en.wikipedia.org/wiki/One_Definition_Rule

我像这样改变了你的代码:

inline __device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

现在可以编译了。您没有内联 getCurThreadIdx() 的声明违反了一个定义规则。

关于linker - CUDA:头文件中使用的 __device__ 函数上的 LNK2005 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5295503/

linker - CUDA:头文件中使用的 device 函数上的 LNK2005 错误

上一篇：SQL Access 查询-如果存在则更新行，如果不存在则插入

下一篇：r - 遍历数据框列表并改变列