python - 为什么这段 C 代码通过 Python 的 ctypes 运行时的执行速度是直接运行时的一半？

我正在玩弄 Python 和 C 的接口(interface)，作为一个简单的测试，我比较了 C 中的 SuperFastHash 实现与 Python 中的 SuperFastHash 实现的速度，然后查看仅从 Python 调用 C 版本的结果.这导致了一个令人惊讶的结果。这是 C 代码:http://pastebin.com/Hc7iqzH1我的基准测试 main() 在底部。

当使用 gcc -O3 -lrt hash_test.c 编译并运行可执行文件时，我得到以下结果:secs: 20, hashes: 650449494, hashes/sec: 32522474.700000, Khashes/秒:32522.474700

当使用 gcc -lrt -O3 -fPIC -shared hash_test.c -o super.so 编译 .so 文件，并运行包含 Python (2.7) 的脚本时

from ctypes import *
lib = cdll.LoadLibrary('./super.so')
lib.main()

我得到的结果是:secs: 20, hashes: 306842579, hashes/sec: 15342128.950000, Khashes/sec: 15342.128950

在相同的时间内，这只计算了直接程序调用的大约一半的哈希值。为什么？

最佳答案

性能下降的原因是编译器在共享库的情况下无法优化。它没有内联对 SuperFastHash 的调用。但这与 jxh 所建议的 PIC 格式无关。

如果您使用函数代码手动内联对 SuperFastHash 的调用，您会发现 Python 代码将产生与原始 C 代码相同的性能。这是我的版本:https://gist.github.com/cod3monk/9821796

另一方面，可以使用以下 C 代码重现 python 的不良性能:</p>

#include <time.h>
#include <stdio.h>
#include <unistd.h>

uint32_t SuperFastHash (const char * data, int len);

int main(void) {
      struct timespec start, end;
  long secs;
  long hashes = 0;
  char data[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
  clock_gettime(CLOCK_MONOTONIC, &start);
  clock_gettime(CLOCK_MONOTONIC, &end);
  while ((secs = end.tv_sec - start.tv_sec) < 20) {
    uint32_t hash = SuperFastHash(data, 20);
    data[hash % 20] += 1;
    clock_gettime(CLOCK_MONOTONIC, &end);
    ++hashes;
  }

  printf("secs: %ld, hashes: %ld, hashes/sec: %f, Khashes/sec: %f\n", secs, hashes, hashes/20.0,
      hashes/20.0/1000.0);
  return 0;
}

现在使用 gcc -O3 the_above_code.c super.so -lrt 编译并运行它 (LD_LIBRARY_PATH=. ./a.out)。

关于python - 为什么这段 C 代码通过 Python 的 ctypes 运行时的执行速度是直接运行时的一半？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22700082/

python - 为什么这段 C 代码通过 Python 的 ctypes 运行时的执行速度是直接运行时的一半？

上一篇：python - 在 Pandas Dataframe 中查找字符串模式匹配并返回匹配的字符串

下一篇：python - Numpy 高效构造稀疏 coo_matrix 或更快的列表扩展