c++ - 计算序列可能性的更快方法？

这是我上一个问题的第二个问题
Faster way to do multi dimensional matrix addition? 在遵循@Peter Cordes 的建议后，我对代码进行了矢量化，现在速度提高了 50 倍。然后我再次做了gprof，发现这个函数占用了大部分时间。

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
 69.97      1.53     1.53                             cal_score(int, std::string, int const*, int, double)

double cal_score(int l, string seq, const int *__restrict__ pw,int cluster,double alpha)
{
  const int cols =4;
  const int *__restrict__ pwcluster = pw + ((long)cluster) * l * cols;
  double score = 0;
  char s;
  string alphabet="ACGT";   
  int count=0;
  for(int k=0;k<cols;k++)           
    count=count+pwcluster[k];

  for (int i = 0; i < l; i++){
    long row_offset = cols*i;
    s=seq[i];
    //#pragma omp simd 
    for(int k=0;k<cols;k++) {
            if (s==alphabet[k])
                score=score+log(    ( pwcluster[row_offset+k]+alpha )/(count+4*alpha)       );
    }
  }
  return score;
}

我是第一次进行代码优化，所以不知道如何继续。那么有没有什么办法可以把这个函数写得更好呢？这样我就能获得更快的速度。输入seq是长度为l的字符'ACGT'的序列。 pw 是大小为 2*l*4 或 [p][q][r] 的一维数组，簇为 p。

最佳答案

这是重写它的另一种方法。这会使用查找表而不是搜索来转换字符串，并且调用 log 的次数减少了 10 倍。

这还将 seq 更改为按引用传递的 const char*，而不是按值传递的 std::string。 (这将复制整个字符串)。

unsigned char transTable[128];

void InitTransTable(){
  memset(transTable, 0, sizeof(transTable));
  transTable['A'] = 0;
  transTable['C'] = 1;
  transTable['G'] = 2;
  transTable['T'] = 3;
}

static int tslen = 0;                // static instead of global lets the compiler keep tseq in a register inside the loop
static unsigned char* tseq = NULL;   // reusable buffer for translations.  Not thread-safe

double cal_score(
    int l
  , const unsigned char* seq         // if you want to pass a std::string, do it by const &, not by value
  , const int *__restrict__ pw
  , int cluster
  , double alpha
  )
{
  int i, j, k;
  // make sure tseq is big enough
  if (tseq == NULL){
    tslen = std::max(4096, l+1024);
    tseq = new unsigned char[tslen];
    memset(tseq, 0, tslen);
  } else if (l > tslen-1){
    delete tseq;
    tslen = l + 4096;
    tseq = new unsigned char[tslen];
    memset(tseq, 0, tslen);
  }
  // translate seq into tseq
  // (decrementing i so the beginning of tseq will be hot in cache when we're done)
  for (i = l; --i >= 0;) tseq[i] = transTable[seq[i]];

  const int cols = 4;
  const int *__restrict__ pwcluster = pw + ((long)cluster) * l * cols;
  double score = 0;
  // count up pwcluster
  int count=0;
  for(k = 0; k < cols; k++) count += pwcluster[k];

  double count4alpha = (count + 4*alpha);
  long row_offset = 0;
  for (i = 0; i < l;){
    double product = 1;
    for (j = 0; j < 10 && i < l; j++, i++, row_offset += cols){
      k = tseq[i];
      product *= (pwcluster[row_offset + k] + alpha) / count4alpha;
    }
    score += log(product);
  }
  return score;
}

这个compiles to fairly good code ，但如果没有 -ffast-math，除法不能被乘法代替。

它不会自动矢量化，因为我们只加载 pwcluster 的每四个元素之一。

关于c++ - 计算序列可能性的更快方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37254001/

c++ - 计算序列可能性的更快方法？

上一篇：c++ - 成员函数 .begin() 和 std::begin()

下一篇：c++ - 边缘检测 - 不良检测