c++ - Windows 中优于 100ns 的分辨率计时器

标签 c++ c benchmarking timing rdtsc

我从事编程语言分析器的工作,我正在寻找分辨率高于 100 ns 的 Windows 计时器解决方案。

  • QueryPerformanceCounter 应该是一个答案,但 QueryPerformanceFrequency 返回的频率在 Windows 10 上为 10 MHz,在 Windows 7 上甚至更低
  • GetSystemTimePreciseAsFileTime 有 100 ns 滴答/步
  • RDTSC 分辨率优于 1ns,但随频率变化

  • 我的目标分辨率至少为 10 ns。
    目前最好的解决方案是什么?QueryPerformanceCounter 是如何实现的?可以轻松拆卸并提高分辨率吗?
    是否可以直接使用 RDTSC 并在每次频率变化时跟踪/中断?

    最佳答案

    How QueryPerformanceCounter is implemented?


    QPC 计时器在 HAL 中根据硬件有不同的实现;它使用 TSC、HPET、RTC、APIC、ACPI 或 8254 计时器,具体取决于可用性。
    QPC 计时器分辨率硬编码为 100ns。但这并不重要,因为对 QPC 的调用本身需要 >100ns。 100ns 在 Windows 世界中只是非常非常短的时间。

    RDTSC has resolution better than 1ns, but it varies with frequency


    并非如此,自 Nehalem 以来,TSC 频率实际上非常稳定。请参阅英特尔 64 位架构 SDM 卷。 3A,“17.16 不变 TSC”:

    Processor families increment the time-stamp counter differently:

    • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle. The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel SpeedStep technology transitions may also impact the processor clock.

    • For Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family [06H], DisplayModel [17H]); for Intel Atom processors (family [06H], DisplayModel [1CH]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the processor base frequency, see Section 18.18.2 for more detail. On certain processors, the TSC frequency may not be the same as the frequency in the brand string.

    The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.


    因此,为了快速测量,您应该能够使用 __rdtsc __rdtscp 。您可以在启动时校准 TSC 频率并确保它不依赖于 CPU 状态。尽管如此,该线程仍然可以被抢占,因此多次重复测量或使用 QueryThreadCycleTime (当然它有自己的开销)是很好的。在实践中,我发现 RDTSC 没有 Calculate system time using rdtsc 中呈现的那么糟糕,尽管后者仍然是一个很好的阅读。

    关于c++ - Windows 中优于 100ns 的分辨率计时器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63205226/

    相关文章:

    c++ - static const的C++容器初始化列表导致栈溢出

    c++ - 大声限制对命名空间中定义的全局变量的访问

    C中的构造函数和析构函数

    测试 "Class"的不同仿真的 Javascript 基准

    c++ - 使用 C++ 获取系统硬盘使用情况

    c++ - 从后面访问 vector

    c - Gradle C插件: how to solve references between multiple modules

    c - 为什么我的指针不能访问结构的成员元素

    java - JDK8 LocalDate.toEpochDay 性能异常下降

    c# - 我什么时候应该使用 sorteddictionary 而不是字典