c++ - 非均匀采样的CUDA重采样

我想从非均匀采样中重新采样(插值)序列。我认为tex不起作用是因为假设您的样本是均匀的，那么插值基本上是有效的？搜索会太慢吗？

我应该推一下吗？任何指针表示赞赏。任何示例都将大有帮助。

更新:

说带有圆圈标记的线是我的示例。我知道每个圆点的值。显然，样品在水平轴上均匀分布。

现在，我想知道采样线下方的每个x标记处的值。 x标记沿线均匀分布。

--- o -------- o ---- o ------ o ------ o ------ o ------(采样)

--X ----- X ----- X ----- X ----- X ----- X ----- X ---(已知为插值)

所以我想知道如何使用CUDA获得每个x标记位置的值？显然，使用C / C++的最基本算法是针对每个x标记位置，搜索两个最近的圆位置，然后进行线性插值。但是在这种情况下，您需要首先对两个序列进行排序，然后循环遍历x标记，然后对每个x标记进行搜索。这听起来很宽泛。

我想知道我们应该如何在CUDA中做到这一点？谢谢。

最佳答案

可能有很多方法。例如，您可以以线程并行方式使用基本cuda binary search。我将演示thrust实现。

出于讨论的目的，我假设两个数据集(已知样本点和所需样本点)是任意放置的(即，我不假设任何一个样本集均等分布)。但是，我将规定或要求将所需的采样点完全包含在已知的采样点内。我相信这是明智的，因为通常的线性插值需要在所需采样点的任一侧都具有一个已知采样点。

因此，我们将使用如下数据集:

   o:  1,3,7
f(o):  3,7,15
   x:  1.5, 2.5, 4.5, 5.0, 6.0, 6.5
f(x):    ?,   ?,   ?,   ?,   ?,   ?

我们看到f是已知的函数值，在这种情况下，它对应于f(o) = 2o+1，这是一条直线(尽管此方法不需要已知的采样点来适合任何特定函数)。 x表示基于已知值(f(o))我们希望在其上插值函数值的索引。然后，我们的愿望是通过从最近的f(x)点进行插值来计算f(o)。请注意，我们的数据集使得x的所有值都介于最小(1)和最大(7)o值之间。这是我之前所说的规定/要求。

我们的推力方法是使用矢量化二进制搜索(使用thrust::upper_bound)来定位“插入点”，其中每个所需的x值都适合o序列。这给了我们的右邻居和左邻居(right-1)进行插值。一旦我们知道了插入点，选择该算法将是对该算法的简单扩展。如果我们想使用线性插值以外的其他方法，则需要两个左和两个右邻(或更多个)。

然后，插入点为我们提供了我们的左右邻居，并且我们使用此信息将thrust::transform vector (所需的插值点)以及x(通过thrust::tuple)传递给适当制作的thrust::zip_iterator操作，该操作提供:

正确的邻居索引

右邻居功能值

左邻居索引

左邻居功能值

有了这些数量，再加上所需的索引(x)，插值就很简单。

编辑:受另一个答案的启发，我决定包括一种避免并行二进制搜索的方法，而是使用前缀和方法来标识x数据中o数据的插入索引。此方法假定x和o序列都已排序。

我们将从merge_by_key操作开始。我们将x与o合并，以建立顺序(这似乎比二进制搜索更有效)。 x和o数量将是“键”，并且o的值将全部为1，x的值将全部为0。然后使用我们的样本数据，merge_by_key将产生以下结果:

o keys:  1,3,7
o vals:  1,1,1

x keys:  1.5,2.5,4.5,5.0,6.0,6.5
x vals:    0,  0,  0,  0,  0,  0

merged keys:  1, 1.5, 2.5,   3, 4.5, 5.0, 6.0, 6.5,   7
merged vals:  1,   0,   0,   1,   0,   0,   0,   0,   1

当我们对合并的val进行前缀总和(包括扫描)时，我们得到:

ins. ind.:    1,   1,   1,   2,   2,   2,   2,   2,   3

然后，我们可以执行copy_if操作以仅提取与x val(其合并的val为零)关联的插入索引，以生成与步骤1中产生的插入索引相同的序列:

 d_i:  1, 1, 2, 2, 2, 2

然后，方法2的其余部分可以使用与方法1中使用的完全相同的剩余插值代码(thrust::transform)。

这是一个完整的示例，显示了两种方法:

$ cat t1224.cu
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <iostream>
#include <thrust/merge.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/scan.h>

struct interp_func
{
  template <typename T>
  __host__ __device__
  float operator()(float t1, T t2){  // m = (y1-y0)/(x1-x0)  y = m(x-x0) + y0
    return ((thrust::get<1>(t2) - thrust::get<3>(t2))/(thrust::get<0>(t2) - thrust::get<2>(t2)))*(t1 - thrust::get<2>(t2)) + thrust::get<3>(t2);
    }
};

using namespace thrust::placeholders;

int main(){

  // sample data
  float o[] = {1.0f, 3.0f, 7.0f}; // unevenly spaced sample points for function f
  float f[] = {3.0f, 7.0f, 15.0f}; // f(o) = 2o+1
  float x[] = {1.5f, 2.5f, 4.5f, 5.0f, 6.0f, 6.5f}; // additional desired sample points for f
  int so = sizeof(o)/sizeof(o[0]);
  int sx = sizeof(x)/sizeof(x[0]);

  // setup data on device
  thrust::device_vector<float> d_o(o, o+so);
  thrust::device_vector<float> d_f(f, f+so);
  thrust::device_vector<float> d_x(x, x+sx);
  thrust::device_vector<int>   d_i(sx); // insertion indices
  thrust::device_vector<float> d_r(sx); // results
  // method 1: binary search
  // perform search for insertion indices
  thrust::upper_bound(d_o.begin(), d_o.end(), d_x.begin(), d_x.end(), d_i.begin());
  // then perform linear interpolation based on left and right neighbors
  std::cout << "Method 1 insertion indices:" << std::endl;
  thrust::copy(d_i.begin(), d_i.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  thrust::transform(d_x.begin(), d_x.end(), thrust::make_zip_iterator(thrust::make_tuple(thrust::make_permutation_iterator(d_o.begin(), d_i.begin()), thrust::make_permutation_iterator(d_f.begin(), d_i.begin()), thrust::make_permutation_iterator(d_o.begin(), thrust::make_transform_iterator(d_i.begin(), _1-1)), thrust::make_permutation_iterator(d_f.begin(), thrust::make_transform_iterator(d_i.begin(), _1-1)))), d_r.begin(), interp_func());

  // output results
  std::cout << "Interpolation points:" << std::endl;
  thrust::copy(d_x.begin(), d_x.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "Interpolated values:" << std::endl;
  thrust::copy(d_r.begin(), d_r.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "Expected values:" << std::endl;
  for (int i = 0; i < sx; i++) std::cout << 2*x[i]+1 <<  ",";
  std::cout << std::endl;

  //method 2: merge + prefix sum
  thrust::device_vector<float> d_kr(sx+so);
  thrust::device_vector<int> d_vr(sx+so);
  thrust::device_vector<int> d_s(sx+so);
  thrust::merge_by_key(d_o.begin(), d_o.end(), d_x.begin(), d_x.end(), thrust::constant_iterator<int>(1), thrust::constant_iterator<int>(0), d_kr.begin(), d_vr.begin());
  thrust::inclusive_scan(d_vr.begin(), d_vr.end(), d_s.begin());
  thrust::copy_if(d_s.begin(), d_s.end(), d_vr.begin(), d_i.begin(), _1 == 0);
  std::cout << "Method 2 insertion indices:" << std::endl;
  thrust::copy(d_i.begin(), d_i.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
  // remainder of solution method would be identical to end of method 1 starting with the thrust::transform
  return 0;
}
$ nvcc -o t1224 t1224.cu
$ ./t1224
Method 1 insertion indices:
1,1,2,2,2,2,
Interpolation points:
1.5,2.5,4.5,5,6,6.5,
Interpolated values:
4,6,10,11,13,14,
Expected values:
4,6,10,11,13,14,
Method 2 insertion indices:
1,1,2,2,2,2,
$

同样，一旦我们知道了插入点，选择2个右和2个左邻居进行更复杂的插值将是一个简单的扩展。我们只需要修改传递给变换(插值)仿函数的zip迭代器，并修改仿函数本身以实现所需的算法即可。

还要注意，该方法假定输入的o序列已被排序。如果不是，则有必要添加o(键)和f(值)的按键排序。无需为方法1排序x序列，但必须为方法2排序(合并要求对两个序列都进行排序)。

关于c++ - 非均匀采样的CUDA重采样，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38745091/

c++ - 非均匀采样的CUDA重采样

上一篇：c++ - 我无法编译这个简单的 C++ 脚本。有任何想法吗？

下一篇：c++ - LLVM如何无一异常(exception)地使用C++的标准容器？