python - 磁盘寻道时间测量方法

标签 python performance python-2.7 disk hard-drive

我编写了一个脚本来测量 HDD 上的寻道时间,以及它的完成方式的微小变化会导致截然不同的时间。

第一个循环在磁盘开始的区域内进行跳跃。第二个周期选择磁盘上执行查找的随机区域(相同大小)。 这种方法明显不同,但我不明白为什么它会改变结果?请注意,对于大面积测量,两种方法都会收敛。

Bytes* 方法可以很好地格式化数字 (1024 <-> "1KB")。脚本必须在 root 下运行。磁盘默认为sdb。


import sys, os, time, random


#--------------------------------------------------------------------------------------------------

def BytesString(n):
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    suffix = 0
    while n % 1024 == 0 and suffix+1 < len(suffixes):
        suffix += 1
        n /= 1024
    return '{0}{1}'.format(n, suffixes[suffix])

def BytesInt(s):
    if all(c in '0123456789' for c in s):
        return int(s)
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    for power,suffix in reversed(list(enumerate(suffixes))):
        if s.endswith(suffix):
            return int(s.rstrip(suffix))*1024**power
    raise ValueError('BytesInt requires proper suffix ('+' '.join(suffixes)+').')

def BytesStringFloat(n):
    x = float(n)
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    suffix = 0
    while x > 1024.0 and suffix+1 < len(suffixes):
        suffix += 1
        x /= 1024.0
    return '{0:0.2f}{1}'.format(x, suffixes[suffix])


#--------------------------------------------------------------------------------------------------

disk = open('/dev/sdb', 'r')
disk.seek(0,2)
disksize = disk.tell()
os.system('echo noop | sudo tee /sys/block/sdb/queue/scheduler > /dev/null')

print 'Syntax: progam [-s -sr -t -tr] [-v]:  to run specific modes; for verbose mode.'
print 'Disk name: {0}  Disk size: {1}  Scheduler disabled.'.format(
    disk.name, BytesStringFloat(disksize))

displaytimes = '-v' in sys.argv


#--------------------------------------------------------------------------------------------------

bufsize = 512
bufcount = 100
displaysamplecount = 24

for randomareas in [False,True]:
    print
    print 'Measuring: Random seek time {0}'.format(
        'using random areas of disk.' if randomareas else 'using beginning of disk.')
    print 'Samples: {0}{1}   Sample size: {2}'.format(
        bufcount, ' (displayed {0})'.format(displaysamplecount) if displaytimes else '', bufsize)

    for area in [BytesInt('1MB')*2**i for i in range(0,64)]+[disksize]:
        if area > disksize:
            continue

        os.system('echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null')

        times = []
        disk.seek(0)
        disk.read(bufsize)
        for _ in range(bufcount):
            left = random.randint(0, disksize-area) if randomareas else 0
            right = left + random.randint(0, area)
            disk.seek(left)
            disk.read(bufsize)
            start = time.time()
            disk.seek(right)
            disk.read(bufsize)
            finish = time.time()
            times.append(finish-start)

        times = sorted(times)[:bufcount*95/100]
        print 'Area tested: {0:6}   Average: {1:5.2f} ms   Max: {2:5.2f} ms   Total: {3:0.2f} sec'.format(
            BytesString(area) if area < disksize else BytesStringFloat(area), 
            sum(times)/len(times)*1000, max(times)*1000, sum(times))
        if displaytimes:
            print 'Read times: {0} ... {1} ms'.format(
                ' '.join(['{0:0.2f}'.format(x*1000) for x in times[:displaysamplecount/2]]), 
                ' '.join(['{0:0.2f}'.format(x*1000) for x in times[-displaysamplecount/2:]]))

Measuring: Random seek time using beginning of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  0.14 ms   Max:  0.35 ms   Total: 0.01 sec
Area tested: 2MB      Average:  0.16 ms   Max:  0.31 ms   Total: 0.02 sec
Area tested: 4MB      Average:  0.20 ms   Max:  0.75 ms   Total: 0.02 sec
Area tested: 8MB      Average:  0.19 ms   Max:  0.97 ms   Total: 0.02 sec
Area tested: 16MB     Average:  0.64 ms   Max:  7.97 ms   Total: 0.06 sec
Area tested: 32MB     Average:  2.29 ms   Max: 10.56 ms   Total: 0.22 sec
Area tested: 64MB     Average:  3.89 ms   Max: 12.25 ms   Total: 0.37 sec
Area tested: 128MB    Average:  6.32 ms   Max: 13.18 ms   Total: 0.60 sec
Area tested: 256MB    Average:  6.73 ms   Max: 13.04 ms   Total: 0.64 sec
Area tested: 512MB    Average:  7.43 ms   Max: 13.72 ms   Total: 0.71 sec
Area tested: 1GB      Average:  8.38 ms   Max: 13.59 ms   Total: 0.80 sec
Area tested: 2GB      Average:  8.51 ms   Max: 13.81 ms   Total: 0.81 sec
Area tested: 4GB      Average:  8.87 ms   Max: 13.86 ms   Total: 0.84 sec
Area tested: 8GB      Average:  9.82 ms   Max: 14.66 ms   Total: 0.93 sec
Area tested: 16GB     Average:  9.73 ms   Max: 15.95 ms   Total: 0.92 sec
Area tested: 32GB     Average:  9.89 ms   Max: 15.18 ms   Total: 0.94 sec
Area tested: 64GB     Average: 10.60 ms   Max: 15.85 ms   Total: 1.01 sec
Area tested: 128GB    Average: 11.18 ms   Max: 18.68 ms   Total: 1.06 sec
Area tested: 256GB    Average: 13.31 ms   Max: 30.94 ms   Total: 1.26 sec
Area tested: 512GB    Average: 14.14 ms   Max: 31.70 ms   Total: 1.34 sec
Area tested: 1TB      Average: 15.20 ms   Max: 33.35 ms   Total: 1.44 sec
Area tested: 1.36TB   Average: 15.47 ms   Max: 25.30 ms   Total: 1.47 sec

Measuring: Random seek time using random areas of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  7.21 ms   Max: 35.94 ms   Total: 0.69 sec
Area tested: 2MB      Average:  5.40 ms   Max: 12.92 ms   Total: 0.51 sec
Area tested: 4MB      Average:  6.97 ms   Max: 36.60 ms   Total: 0.66 sec
Area tested: 8MB      Average:  7.24 ms   Max: 15.05 ms   Total: 0.69 sec
Area tested: 16MB     Average:  7.36 ms   Max: 13.03 ms   Total: 0.70 sec
Area tested: 32MB     Average:  7.34 ms   Max: 12.30 ms   Total: 0.70 sec
Area tested: 64MB     Average:  7.35 ms   Max: 13.47 ms   Total: 0.70 sec
Area tested: 128MB    Average:  7.66 ms   Max: 13.37 ms   Total: 0.73 sec
Area tested: 256MB    Average:  7.93 ms   Max: 13.34 ms   Total: 0.75 sec
Area tested: 512MB    Average: 10.16 ms   Max: 39.67 ms   Total: 0.97 sec
Area tested: 1GB      Average:  8.76 ms   Max: 14.38 ms   Total: 0.83 sec
Area tested: 2GB      Average:  9.42 ms   Max: 17.74 ms   Total: 0.89 sec
Area tested: 4GB      Average: 11.00 ms   Max: 23.22 ms   Total: 1.05 sec
Area tested: 8GB      Average: 10.59 ms   Max: 19.60 ms   Total: 1.01 sec
Area tested: 16GB     Average: 10.91 ms   Max: 19.15 ms   Total: 1.04 sec
Area tested: 32GB     Average: 11.19 ms   Max: 26.02 ms   Total: 1.06 sec
Area tested: 64GB     Average: 12.59 ms   Max: 26.49 ms   Total: 1.20 sec
Area tested: 128GB    Average: 11.97 ms   Max: 19.30 ms   Total: 1.14 sec
Area tested: 256GB    Average: 12.61 ms   Max: 22.84 ms   Total: 1.20 sec
Area tested: 512GB    Average: 13.62 ms   Max: 20.48 ms   Total: 1.29 sec
Area tested: 1TB      Average: 16.72 ms   Max: 29.20 ms   Total: 1.59 sec
Area tested: 1.36TB   Average: 15.96 ms   Max: 26.21 ms   Total: 1.52 sec

最佳答案

现代 HDD 具有内置缓存 - 如果您读取一个位置,“某些逻辑”将在内部缓存它周围的区域,如果您下次读取它附近的内容,它将提供缓存中的数据(如果存在),否则从磁盘读取。

从磁盘开始读取

Measuring: Random seek time using beginning of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  0.14 ms   Max:  0.35 ms   Total: 0.01 sec

将从那里缓存内容 - 连续读取将从(更快的)缓存中读取。

读取随机位置:

Measuring: Random seek time using random areas of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  7.21 ms   Max: 35.94 ms   Total: 0.69 sec

将无法从缓存中读取 - 除非您连续多次读取“相同的随机位置”。

您的代码未使用 相同 随机区域 100 次:

for _ in range(bufcount):
    left = random.randint(0, disksize-area) if randomareas else 0
    right = left + random.randint(0, area)
    disk.seek(left)
    disk.read(bufsize)
    start = time.time()
    disk.seek(right)
    disk.read(bufsize)
    finish = time.time()
    times.append(finish-start)

它为 100 个 bufcounts 中的每一个创建新的 leftright - 如果您随机寻找所以你不会从 HDD 缓存中获利(大多数情况下,除非随机命中相似的数字完全是偶然的)。

关于python - 磁盘寻道时间测量方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38292071/

相关文章:

python - 使用 PyCharm 我想显示 plot extra figure 窗口

python - 除了 asyncio.Queue.put 之外,都错了?

mysql - 检查MySQL表是否为空: COUNT(*) is zero vs. LIMIT(0,1)有结果?

python-2.7 - 有什么方法可以在不绘制直方图的情况下使用 matplotlib.pyplot 创建直方图?

python - 为什么 *args 不适用于字符串格式

python - 使用 python 获取实际的 facebook 和 twitter 图像 url

python - Cython:内存 View 的大小属性

python - 如何使用 Click 在 Python 中处理 CLI 的用户身份验证

c++ - 从 C++ 文件中快速读取特定单词

javascript - 淡入/淡出许多(20 个左右)项目同时导致巨大的性能损失。解决方案?