英特尔的"Optimization Reference Manual"提到了一个新的 cpu 功能“Fast Short REP CMPSB 和 SCASB”,可以加速字符串操作:
REP CMPSB and SCASB performance is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. When the Fast Short REP CMPSB and SCASB feature is enabled, REP CMPSB and REP SCASB performance is flat 15 cycles per operation, for all strings 1-128 byte long whose two source operands reside in the processor first level cache.
Support for fast short REP CMPSB and SCASB is enumerated by the CPUID feature flag: CPUID.07H.01H:EAX.FAST_SHORT_REP_CMPSB_SCASB[bit 12] = 1.
Fast Short REP MOVSB 明确提及支持
Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced
但我找不到任何有关哪一代 CPU 开始支持“Fast Short REP CMPSB”的信息。
最佳答案
CPUID dump for Core i5-12500 (只有性能核心,没有效率核心)显示对此功能的支持。
Dumps for 1350P and 1365U也表示支持。
有趣的是,我没有在任何其他 13x00 核心中看到它。
InstLatX64 on twitter还向我指出了“Intel® 64 和 IA-32 架构软件开发人员手册”,内容如下:
Fast Short REP CMPSB, fast short REP SCASB 4th generation Intel® Xeon® Scalable Processor Family based on Sapphire Rapids microarchitecture
关于optimization - 哪些处理器支持 "Fast Short REP CMPSB and SCASB",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75309389/