python - Librosa 无法从 BytesIO 加载

我目前正在尝试创建一个用于深度学习的大型数据集，其中包含大量存储在一起的压缩 mp3 文件，因此我没有 10 万个文件必须单独加载。

x = b''
with open("file1.mp3", "rb") as f:
    x += f.read()
print(len(x)) # 362861
with open("file2.mp3", "rb") as f:
    x += f.read()
print(len(x)) # 725722
with open("testdataset", 'wb+') as f:
    f.write(x)

现在我想一个一个地加载它:

with open("testdataset", 'rb') as f:
    bs = f.read(362861)
    y, sr = librosa.core.load(io.BytesIO(bs), mono=True, sr=44100, dtype=np.float32) # crahes

它因以下错误而中断:

RuntimeError: Error opening <_io.BytesIO object at 0x7f509ed1cf90>: File contains data in an unknown format.

为了测试，我尝试加载原始文件，效果很好:

y, sr = librosa.core.load("file1.mp3", mono=True, sr=44100, dtype=np.float32) # works fine

请注意，原始 mp3 的“虚拟”加载也会引发警告:

UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn('PySoundFile failed. Trying audioread instead.')

为什么会这样？是否有更好的方法将大量单独的文件存储在一起并一次加载它们？

以下是我使用的版本:

python: 3.8.3 (default, May 14 2020, 20:11:43) 
[GCC 7.5.0]
librosa: 0.7.2
audioread: 2.1.8
numpy: 1.19.0
scipy: 1.5.0
sklearn: 0.23.1
joblib: 0.15.1
decorator: 4.4.2
six: 1.15.0
soundfile: 0.10.3
resampy: 0.2.2
numba: 0.48.0

最佳答案

如果您使用的是 torchaudio，请执行以下操作: !pip install torch==1.11.0 torchaudio==0.11.0 -f https://download.pytorch.org/whl/cu113/torch_stable.html

关于python - Librosa 无法从 BytesIO 加载，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62601346/

上一篇：javascript - React 如何区分具有相同父级的两个数组的相似键？

下一篇：ios - VoiceOver 在 UITableView 中滚动，其中单元格和 subview 都是可访问性元素

相关文章：

python - 如何使用python计算数据框中特定行值之间的时间差？

python - Django Count() 多个字段并按它们的总和排序

python - 如何运行具有绝对导入的子目录内的 python 脚本

python - 通过 Anaconda 安装 `libm.so.6`

python - 如何将 mfcc 向量与注释中的标签结合起来传递给神经网络

python - 从 python 项目的 API 获取股票历史数据

Python3 dbus导入错误: undefined symbol: _Py_ZeroStruct

python - 在 python 中轮询 api 以获取特定的 json 元素

python - python3 无效语法错误

matplotlib - Librosa mel 滤波器组递减三角形