python - Azure表单识别器错误: "Failed to download image from input URL."

标签 python azure azure-cognitive-services azure-form-recognizer

我正在关注these instructions使用 Azure 的布局表单识别器服务 其中有如下代码:

########### Python Form Recognizer Async Layout #############

import json
import time
from requests import get, post

# Endpoint URL
endpoint = r"<Endpoint>"
apim_key = "<Subscription Key>"
post_url = endpoint + "/formrecognizer/v2.0-preview/Layout/analyze"
source = r"<path to your form>"

headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': apim_key,
}
with open(source, "rb") as f:
    data_bytes = f.read()

try:
    resp = post(url = post_url, data = data_bytes, headers = headers)
    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % resp.text)
        quit()
    print("POST analyze succeeded:\n%s" % resp.headers)
    get_url = resp.headers["operation-location"]
except Exception as e:
    print("POST analyze failed:\n%s" % str(e))
    quit()

我尝试了代码,但出现以下错误:

POST analyze failed:
{"error":{"code":"FailedToDownloadImage","message":"Failed to download image from input URL."}}
POST analyze succeeded:
{'Transfer-Encoding': 'chunked', 'Content-Type': 'application/json; charset=utf-8', 'x-envoy-upstream-service-time': '4', 'apim-request-id': '515e93ee-4db8-4174-92b1-63e5c415c056', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'Date': 'Sat, 06 Jun 2020 20:47:28 GMT'}
POST analyze failed:
'operation-location'

我使用的代码是:

import json
import time
from requests import get, post

我在发出请求之前正在阅读 pdf 文件并验证其是否已加载到变量中

source = r"data/Invoice_7.pdf" 
with open(source, "rb") as f:
    data_bytes = f.read()

print (data_bytes[0:10])

然后是请求详细信息:

endpoint = r"https://xxxx.cognitiveservices.azure.com/"

apim_key = "xxxx"
post_url = endpoint + "/formrecognizer/v2.0-preview/Layout/analyze"

headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': apim_key,
}

最后提出请求:

try:
    resp = post(url = post_url, data = data_bytes, headers = headers)
    print (1)
    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % resp.text)
        #quit()
    print (2)
    print("POST analyze succeeded:\n%s" % resp.headers)
    print (3)
    get_url = resp.headers["operation-location"]
    print (4)
except Exception as e:
    print("POST analyze failed:\n%s" % str(e))
    #quit()

我在每一步都打印一个数字,因为我发现很奇怪,我同时收到失败和成功的请求响应。这是结果:

1
POST analyze failed:
{"error":{"code":"FailedToDownloadImage","message":"Failed to download image from input URL."}}
2
POST analyze succeeded:
{'Transfer-Encoding': 'chunked', 'Content-Type': 'application/json; charset=utf-8', 'x-envoy-upstream-service-time': '1', 'apim-request-id': '93a2a162-d14f-496f-ba8a-077bcfd5d3c7', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'Date': 'Sat, 06 Jun 2020 21:00:20 GMT'}
3
POST analyze failed:
'operation-location'

因此代码在这一行失败:

get_url = resp.headers["operation-location"]

响应变量中的文本是:

'{"error":{"code":"FailedToDownloadImage","message":"Failed to download image from input URL."}}'

最佳答案

As defined in the REST API documentation ,您需要指定Content-Type。当您将 Content-Type 设置为 application/json 时,您需要通过 JSON 提供公共(public)可访问源。在您的情况下,您需要将 Content-Type 设置为 application/pdf。当您想要使其动态化时,您可以使用 PyPi 包 filetype .

顺便问一下,你知道有一个(beta) Python SDK for Form Recognizer,吗?您可以将其用于您的用例。

关于python - Azure表单识别器错误: "Failed to download image from input URL.",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62238063/

相关文章:

python - 在 Keras 的 Conv2D 和 Dense 期间数据形状如何变化?

c# - 控制台应用程序客户端到 Service Fabric 无状态服务

c# - 保存 Microsoft Azure Face API 数据

c# - Azure 认知服务上的文本识别

python - 使用 python 从 Azure Data Lake Storage Gen2 读取和写入文件

python - urlparse() 查询字符串丢失

python - wxpython-plotting 中的 Matplotlib 中止

azure - Cosmos DB 中的索引

asp.net-mvc - Serilog、Elmah 或两者都适用于 ASP.NET MVC 项目?

c# - 根据 QnAMaker 中的输入对结果进行排名