cuda - 半精度 : Difference between __float2half vs __float2half_rn

似乎没有关于这两个函数的文档。
__float2half有什么区别和 __float2half_rn ?

最佳答案

看来 CUDA 文档在这里确实有点不足。

函数unsigned short __float2half_rn(float)结合 float __half2float(unsigned short x)在新 half 之前已经存在于 CUDA 中数据类型是在 CUDA 7.5 中引入的。
它在 device_functions.h 中定义.那里的评论写道:

Convert the single-precision float value x to a half-precision floating point value represented in unsigned short format, in round-to-nearest-even mode.

函数half __float2half(float)在 cuda_fp16.h 中定义并且显然相同，但返回 half :

Converts float number a to half precision in round-to-nearest mode.

但是，由于 half是对 unsigned short 的 typedef ，我检查了他们是否也这样做，使用以下代码:

#include <stdio.h>
#include "cuda_fp16.h"
#include "device_functions.h"
__global__ void test()
{
//  auto test = __float2half( 1.4232 );
    auto test = __float2half_rn( 1.4232 );
    printf( "%hu\n", test );
}

int main()
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
}

我发现(对于 sm_20 )旧的 __float2half_rn()有一个额外的 int16 到 int32 操作并执行 32 位存储。另一方面，__float2half_()没有这种转换，并进行 16 位存储。
__float2half_rn()的相关SASS代码:

/*0040*/         I2I.U32.U16 R0, R0;
/*0050*/         STL [R2], R0;

对于 __float2half() :

/*0048*/         STL.U16 [R2], R0;

关于cuda - 半精度 : Difference between __float2half vs __float2half_rn，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35198856/

cuda - 半精度 : Difference between float2half vs float2half_rn

上一篇：steam - 如何使用 SteamWorks API 检索 Steam 用户名？

下一篇：wordpress - 计算 Woocommerce 中每个订单状态的不同订单状态计数和总现金