<分区>
我正在读这个 Dr Dobb's Article on CUDA
In my system, the global memory bandwidth is slightly over 60 GB/s. This is excellent until you consider that this bandwidth must service 128 hardware threads -- each of which can deliver a large number of floating-point operations. Since a 32-bit floating-point value occupies four (4) bytes, global memory bandwidth limited applications on this hardware will only be able to deliver around 15 GF/s -- or only a small percentage of the available performance capability.
问题:GF/s 是指 Giga flops per second??