node.js - 如何在 Node 中接收SIP音频并将WAV流发送到Google语音识别API?

标签 node.js audio speech-recognition asterisk sip

到目前为止,我一直在尝试sipster,但是它有一些令人讨厌的限制(e.g. lack of configurability)。任何想法如何做到这一点?也许有一个像asterisk-manager这样的星号的节点包装?

更详细地说,基本思想是

  • 运行一个虚拟sip客户端,可以接收SIP连接
  • 将来自该连接的音频转换为常规的wav格式
  • 将音频音频流传输到Google语音API
  • 还有其他方法可通过节点对sip流进行操作,例如播放声音
  • 最佳答案

    这篇文章很老,看起来在Google方面已经有了很大的改进,在语音处理器本身(变得越来越准确)以及在Node.js方面,作为与Google交互的Node.js client方面, Cloud Speech API会定期更新。

    根据@arheops的建议,您可能想看看Asterisk的EAGI和Node.js,以便让音频样本被Google转录。

    以下EAGI bash脚本可能会在这方面有所帮助(详细说明可用here):

    #!/bin/bash
    
    # Read all variables sent by Asterisk store them as an array, but won't use them
    declare -a array
    while read -e ARG && [ "$ARG" ] ; do
            array=(` echo $ARG | sed -e 's/://'`)
            export ${array[0]}=${array[1]}
    done
    
    # First argument is language
    case "$1" in
    "fr-FR" | "en-GB" | "es-ES" | "it-IT" )
      LANG=$1
      ;;
    *)
      LANG=en-US
      ;;
    esac
    
    NODECMD=$(which node)
    
    # Second argument is a timeout, in seconds. The duration to wait for voice input form the caller.
    DURATION=$2
    SAMPLE_RATE=8000
    SAMPLE_SIZE_BYTES=2
    let "SAMPLE_SIZE_BITS = SAMPLE_SIZE_BYTES * 8"
    
    # EAGI_AUDIO_FORMAT is an asterisk variable that specifies the sample rate and
    # sample size (usually 16 bits per sample) of the caller's voice stream.
    # Depending on the codec used here, you can get sample rate values ranging from
    # 8000Hz (e.g. G.711 uLaw) to 48000Hz (e.g. opus).
    echo "GET VARIABLE EAGI_AUDIO_FORMAT"
    read line
    EAGI_AUDIO_FORMAT=$(echo $line | sed -r 's/.*\((.*)\).*/\1/')
    
    # 5 seconds of audio input are gathered in ( SAMPLE_RATE / sample_size ) * 5 bytes
    # - SAMPLE_RATE is set as per EAGI_AUDIO_FORMAT
    # - sample_size is set to 2 (16 bits per sample)
    #
    # We don't do much here to adapt the sample rate, this code should be improved
    case "${EAGI_AUDIO_FORMAT}" in
    "slin48")
      SAMPLE_RATE=48000
      ;;
    *)
      SAMPLE_RATE=8000
      ;;
    esac
    
    # Temporary file to store raw audio samples
    AUDIO_FILE=/tmp/audio-${SAMPLE_SIZE_BITS}_bits-${SAMPLE_RATE}_hz-${DURATION}_sec.raw
    
    # We use `dd` here to copy the raw audio samples we're getting from file
    # descriptor 3 (this is the Enhanced version in EAGI) to the temporary file.
    # The number of blocks to copy is a function of the DURATION to record audio and
    # the sample rate. SAMPLE_SIZE_BYTES cannot be changed as it is assumed that each
    # sample is 16 bits in size.
    let "COUNT = SAMPLE_RATE * SAMPLE_SIZE_BYTES * DURATION"
    # By default, dd stores blocks of 512 bytes
    let "BLOCKS = COUNT / 512"
    echo "exec noop \"Number of bytes to store : ${COUNT}\""
    read line
    
    echo "exec noop \"Number of dd blocks to store : ${BLOCKS}\""
    read line
    
    echo "exec playback \"beep\""
    read line
    
    dd if=/dev/fd/3 count=${BLOCKS} of=${AUDIO_FILE}
    echo "exec noop \"File saved !\""
    
    echo "exec noop \"AUDIO_FILE : ${AUDIO_FILE}\""
    read line
    echo "exec noop \"SAMPLE_RATE : ${SAMPLE_RATE}\""
    read line
    echo "exec noop \"LANG : ${LANG}\""
    read line
    
    # Submit audio to Google Cloud Speech API and get the result
    export GOOGLE_APPLICATION_CREDENTIALS=/usr/local/node_programs/service_account_file.json
    RES=$(${NODECMD} /usr/local/node_programs/nodejs-speech/samples/recognize.js sync ${AUDIO_FILE} -e LINEAR16 -r ${SAMPLE_RATE} -l ${LANG})
    
    # clean up result returned from recognize.js :
    # - remove new lines
    # - remove 'Transcription :' header
    RES=$(echo $RES | tr -d '\n' | sed -e 's/Transcription: \(.*$\)/\1/')
    
    # Set GOOGLE_TRANSCRIPTION_RESULT variable, remove temporary file
    # and continue dialplan execution
    echo "set variable GOOGLE_TRANSCRIPTION_RESULT \"${RES}\""
    read line
    
    /bin/rm -f ${AUDIO_FILE}
    
    exit 0
    

    希望这可以帮助!

    关于node.js - 如何在 Node 中接收SIP音频并将WAV流发送到Google语音识别API?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40555589/

    相关文章:

    ios - Dragon Dictation API 连接服务器失败

    json - 通过node.js套接字发送文本JSON如何处理 '\n\r'符号

    node.js - 如何指定亚马逊产品 API 返回的内容

    javascript - Underscore.js throttle 不起作用

    android - 使用本地服务器套接字转换音频

    windows - 如何更改 IAudioClient->GetMixFormat() 方法的结果?

    java - 如何在Java中使用JSpeex

    React-Native 语音转文本

    c# - 从 xml 文件读取语法时,语音识别 : Result. Semantic.ContainsKey 始终为 false

    javascript - node.js express js flash 消息 ajax