python - Azure 函数从输入流保存临时 pdf 文件已损坏

我已将 pdf 上传到 blob 存储，通过 MS Azure Explorer 下载时绝对没问题。

我有一个由队列触发的 Azure 函数，并且还有一个到队列消息中指定的 blob 的输入绑定(bind)。

当我将传入的 blob 写入磁盘时，大小增加了一倍。此外，pdf 已损坏，无法在 pdf 阅读器中打开。在记事本中打开时，字符与原始文件中显示的字符不同。似乎是一个编码问题，但我们处理的是字节而不是文本，所以不确定为什么会发生这种情况。

这是我的代码(使用 python 3):

import azure.functions as func
import tempfile
import os.path

def main(msg: func.QueueMessage, inputblob: func.InputStream, outputTable: func.Out[str]) -> None:

    with tempfile.TemporaryDirectory() as td:
        f_name1 = os.path.join(td, "old.pdf")
        with open(f_name1, 'wb') as fh:
            fh.write(inputblob.read())

最佳答案

是的，这看起来很糟糕，前几个字节被改变了，也许更多(marvin3.jpg是blob存储中的源图像)。

作为解决方法，只需将其添加到您的 function.json blob 输入绑定(bind)中即可:

"dataType": "binary"

如:

{
  "name": "inputBlob",
  "type": "blob",
  "dataType": "binary",
  "direction": "in",
  "path": "images/input_image.jpg",
  "connection": "AzureWebJobsStorage"
}

You shouldn't need to put that in (只有 JavaScript 工作线程需要它)，但我猜想 SDK 中的某个地方存在一个错误，导致无法推断出正确的类型。

完整的工作示例:

def main(req: func.HttpRequest, inputBlob: func.InputStream) -> func.HttpResponse:
    blob = inputBlob.read()

    with open("out.jpg", "wb") as outfile:
        outfile.write(blob)

    return func.HttpResponse(
            "Done. Binary data written to out.jpg",
            status_code=200
        )

This end to end test they have in the Python worker repo似乎还建议使用 blob 输入绑定(bind)时应该存在 "dataType": "binary" (无论文件类型如何，您都应该获取字节)。

如果您尝试将输入 blob 转换为 inputBlob: bytes 而不是 inputBlob: func.InputStream，如果您没有指定的数据类型:

Exception: TypeError: a bytes-like object is required, not 'str'

Python 工作线程返回一个字符串而不是字节。

我有opened an issue here以便更新文档。

关于python - Azure 函数从输入流保存临时 pdf 文件已损坏，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60866994/

python - Azure 函数从输入流保存临时 pdf 文件已损坏

上一篇：azure - 应用程序洞察桌面应用程序

下一篇：c# - 开发存储账户需要身份验证