我正在使用带数据流模式的双 channel DAQ卡。我编写了一些用于分析/计算的代码,并将它们放入操作的主要代码中。但是,一旦FIFO溢出警告标志的总数据达到大约6000 MSamples(板载DAQ内存为8GB),它就会始终出现。我广为人知,复杂的计算可能会延迟系统并导致溢出,但是我写的所有工作都是我的实验所必需的,这意味着无法替换(或者有更有效的代码可以让我获得相同的结果)。我听说过OpenMP可能是提高速度的解决方案,但是我只是C语言的初学者,如何实现我的计算代码?
我的计算机具有64GB RAM和Intel Core i7处理器。运行数据流代码时,我总是关闭其他不必要的软件。我已经尽可能地优化了代码,例如简化了hilbert()
并使用memcpy
来选择特定范围的数据点。
这是我处理数据的方式:
1.安装用于希尔伯特变换的FFTW源代码。
2. For
循环将pi16Buffer
数据解交织到ch2Buffer
3. memcpy
获取我感兴趣的特定范围的数据,并将它们放入另一个名为ch2newBuffer
的数组中
4.对hilbert()
执行ch2newBuffer
并计算其绝对数。
5.找到ch1和abs(hilbert(ch2newBuffer))
的最大值。
6.计算max(abs(hilbert(ch2))) / max(ch1)
。
这是我的DAQ代码的一部分,负责计算:
void hilbert(const int16* in, fftw_complex* out, fftw_plan plan_forward, fftw_plan plan_backward)
{
// copy the data to the complex array
for (int i = 0; i < N; ++i) {
out[i][REAL] = in[i];
out[i][IMAG] = 0;
}
// creat a DFT plan and execute it
//fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(plan_forward);
// destroy a plan to prevent memory leak
//fftw_destroy_plan(plan_forward);
int hN = N>>1; // half of the length (N/2)
int numRem = hN; // the number of remaining elements
// multiply the appropriate value by 2
//(those should multiplied by 1 are left intact because they wouldn't change)
for (int i = 1; i < hN; ++i) {
out[i][REAL] *= 2;
out[i][IMAG] *= 2;
}
// if the length is even, the number of the remaining elements decrease by 1
if (N % 2 == 0)
numRem--;
else if (N > 1) {
out[hN][REAL] *= 2;
out[hN][IMAG] *= 2;
}
// set the remaining value to 0
// (multiplying by 0 gives 0, so we don't care about the multiplicands)
memset(&out[hN + 1][REAL], 0, numRem * sizeof(fftw_complex));
// creat a IDFT plan and execute it
//plan = fftw_plan_dft_1d(N, out, out, FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(plan_backward);
// do some cleaning
//fftw_destroy_plan(plan_backward);
//fftw_cleanup();
// scale the IDFT output
//for (int i = 0; i < N; ++i) {
//out[i][REAL] /= N;
//out[i][IMAG] /= N;
//}
}
float SumBufferData(void* pBuffer, uInt32 u32Size, uInt32 u32SampleBits)
{
// In this routine we sum up all the samples in the buffer. This function
// should be replaced with the user's analysys function
if ( 8 == u32SampleBits )
{
pu8Buffer = (uInt8 *)pBuffer;
for (i = 0; i < u32Size; i++)
{
i64Sum += pu8Buffer[i];
}
}
else
{
pi16Buffer = (int16 *)pBuffer;
fftw_complex(hilbertedch2[N]);
fftw_plan plan_forward = fftw_plan_dft_1d(N, hilbertedch2, hilbertedch2, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_plan plan_backward = fftw_plan_dft_1d(N, hilbertedch2, hilbertedch2, FFTW_BACKWARD, FFTW_ESTIMATE);
ch2Buffer = (int16*)calloc(u32Size / 2, sizeof * ch2Buffer);
ch2newBuffer= (int16*)calloc(u32Size/2, sizeof* ch2newBuffer);
// De-interleave the data from pi16Buffer
for (i = 0; i < u32Size/2 ; i++)
{
ch2Buffer[i] = pi16Buffer[i*2+1];
}
// Pick out the data points range that we are interested
memcpy(ch2newBuffer, &ch2Buffer[6944], 1024 * sizeof(ch2Buffer[0]));
// Do the hilbert transform to these data points
hilbert(ch2newBuffer, hilbertedch2, plan_forward, plan_backward);
fftw_destroy_plan(plan_forward);
fftw_destroy_plan(plan_backward);
//Find max value in each segs of ch1 and ch2
for (i = 128; i < 200 ; i++)
{
if (pi16Buffer[i*2] > max1)
max1 = pi16Buffer[i*2];
}
for (i = 0; i < 1024; i++)
{
if (fabs(hilbertedch2[i][IMAG]) > max2)
max2 = fabs(hilbertedch2[i][IMAG]);
}
Corrected = max2 / max1 / N; // Calculate the signal correction
}
free(ch2Buffer);
free(ch2newBuffer);
return Corrected;
}
最佳答案
循环通常是并行性的一个很好的开始,例如:
#pragma omp parallel for
for (int i = 0; i < N; ++i) {
out[i][REAL] = in[i];
out[i][IMAG] = 0;
}
或者#pragma omp parallel for reduction(max:max2)
for (i = 0; i < 1024; i++)
{
float tmp = fabs(hilbertedch2[i][IMAG]);
max2 = (max2 > tmp) ? max2 : tmp.
}
话虽如此,您需要对代码进行概要分析,以找出执行时间最多的地方,并在可能的情况下尝试并行化。但是,从您发布的内容来看,我看不到那里有很多并行的机会。
关于c - 如何将我的计算C代码与OpenMP集成,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65614369/