python - 如何在python中使用麦克风获得准确的计时

我正在尝试使用 PC 麦克风进行节拍检测，然后使用节拍时间戳计算多个连续节拍之间的距离。我选择了 python，因为它有很多可用的 Material ，而且开发速度很快。通过搜索互联网，我想出了这个简单的代码(还没有先进的峰值检测或任何东西，如果需要的话稍后会出现):

import pyaudio
import struct
import math
import time


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 32
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0

while True:
    data = stream.read(CHUNK)
    amplitude = get_rms(data)
    if amplitude > 0.05:  # value set by observing graphed data captured from mic
        elapsed_time = time.perf_counter() - prev_detect_time
        if elapsed_time > 0.1:  # guard against multiple spikes at beat point
            print(elapsed_time)
            prev_detect_time = time.perf_counter()

def close_stream():
  stream.stop_stream()
  stream.close()
  p.terminate()

这段代码在安静的情况下工作得很好，我在运行它的前两个时刻非常满意，但后来我尝试了它的准确性，但我有点不太满意。为了测试这一点，我使用了两种方法:将节拍器设置为 60bpm 的手机(向麦克风发出滴答声)和连接到蜂鸣器的 Arduino，蜂鸣器由准确的 Chronodot RTC 以 1Hz 的频率触发。蜂鸣器向麦克风发出哔哔声，触发检测。这两种方法的结果看起来相似(数字表示以秒为单位的两次节拍检测之间的距离):

0.9956681643835616
1.0056331689497717
0.9956100091324198
1.0058207853881278
0.9953449497716891
1.0052103013698623
1.0049350136986295
0.9859074337899543
1.004996383561644
0.9954095342465745
1.0061518904109583
0.9953025753424658
1.0051235068493156
1.0057199634703196
0.984839305936072
1.00610396347032
0.9951862648401821
1.0053146301369864
0.9960100821917806
1.0053391780821919
0.9947373881278523
1.0058608219178105
1.0056580091324214
0.9852110319634697
1.0054473059360731
0.9950465753424638
1.0058237077625556
0.995704694063928
1.0054566575342463
0.9851026118721435
1.0059882374429243
1.0052523835616398
0.9956161461187207
1.0050863926940607
0.9955758173515932
1.0058052968036577
0.9953960913242028
1.0048014611872205
1.006336876712325
0.9847434520547935
1.0059712876712297

现在我非常有信心至少 Arduino 可以精确到 1 毫秒(目标精度)。结果往往会偏离 +- 5 毫秒，但有时甚至会偏离 15 毫秒，这是 Not Acceptable 。有没有办法实现更高的准确性，或者是 python/声卡/其他东西的限制？谢谢!

编辑: 将tom10和barny的建议合并到代码中后，代码如下所示:

import pyaudio
import struct
import math
import psutil
import os


def set_high_priority():
    p = psutil.Process(os.getpid())
    p.nice(psutil.HIGH_PRIORITY_CLASS)


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 4096
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RUNTIME_SECONDS = 10

set_high_priority()

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0
TIME_PER_CHUNK = 1000 / RATE * CHUNK
SAMPLE_GROUP_SIZE = 32  # 1 sample = 2 bytes, group is closest to 1 msec elapsing
TIME_PER_GROUP = 1000 / RATE * SAMPLE_GROUP_SIZE

for i in range(0, int(RATE / CHUNK * RUNTIME_SECONDS)):
    data = stream.read(CHUNK)
    time_in_chunk = 0
    group_index = 0
    for j in range(0, len(data), (SAMPLE_GROUP_SIZE * 2)):
        group = data[j:(j + (SAMPLE_GROUP_SIZE * 2))]
        amplitude = get_rms(group)
        amplitudes.append(amplitude)
        if amplitude > 0.02:
            current_time = (elapsed_time + time_in_chunk)
            time_since_last_beat = current_time - prev_detect_time
            if time_since_last_beat > 500:
                print(time_since_last_beat)
                prev_detect_time = current_time
        time_in_chunk = (group_index+1) * TIME_PER_GROUP
        group_index += 1
    elapsed_time = (i+1) * TIME_PER_CHUNK

stream.stop_stream()
stream.close()
p.terminate()

通过这段代码我得到了以下结果(这次的单位是毫秒而不是秒):

999.909297052154
999.9092970521542
999.9092970521542
999.9092970521542
999.9092970521542
1000.6349206349205
999.9092970521551
999.9092970521524
999.9092970521542
999.909297052156
999.9092970521542
999.9092970521542
999.9092970521524
999.9092970521542

如果我没弄错的话，它看起来比以前好多了，而且已经达到了亚毫秒级的精度。感谢 tom10 和 barny 的帮助。

最佳答案

您没有获得正确节拍时间的原因是您丢失了大块音频数据。也就是说，声卡正在读取数据 block ，但您不会在数据被下一个数据 block 覆盖之前收集数据。

不过，首先，对于这个问题，您需要区分计时精度 和实时响应 的概念。

声卡的计时精度应该非常好，比 ms 好得多，并且您应该能够在从声卡读取的数据中捕获所有这些精度。你电脑操作系统的实时响应应该是很差的，比ms还差很多。 也就是说，您应该能够在 1 毫秒内轻松识别音频事件(例如节拍)，但不能在它们发生时识别它们(而是 30-200 毫秒后，具体取决于您的系统)。 这种安排通常适用于计算机，因为人类对事件时间的一般感知远大于 1 毫秒(除了罕见的专门感知系统，例如比较两只耳朵之间的听觉事件等)。

您的代码的具体问题是 CHUNKS 太小，操作系统无法在每个示例中查询声卡。它的频率为 32，因此在 44100Hz 时，操作系统需要每 0.7 毫秒访问一次声卡，这对于负责执行许多其他任务的计算机来说时间太短了。如果您的操作系统在下一个 block 进入之前没有获得该 block ，则原始 block 将被覆盖并丢失。

为了使其正常工作以符合上述约束，使 CHUNKS 比 32 大得多，更像 1024(如PyAudio 示例)。根据您的计算机及其正在执行的操作，即使我的时间不够长。

如果这种方法不适合您，您可能需要像 Arduino 这样的专用实时系统。 (不过，一般来说，这不是必需的，所以在决定使用 Arduino 之前请三思。通常，当我看到人们需要真正的实时时，就是在尝试与人类进行非常定量的交互时，就像闪光一样，让人们点击一个按钮，闪烁另一盏灯，让人们点击另一个按钮，等等，以测量响应时间。)

关于python - 如何在python中使用麦克风获得准确的计时，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53093247/

python - 如何在python中使用麦克风获得准确的计时

上一篇：python - Pandas :按双月日期字段分组

下一篇：python - 在 2D numpy 数组的子矩阵上高效运行