azure - contentOffset 从哪里来？

我正在尝试了解 Azure 认知搜索中的技能组。我想构建一个 Ocr 支持的搜索，并尝试了解它是如何工作的。

例如documentation说 ocr 技能 产生响应:

{
  "text": "Hello World. -John",
  "layoutText":
  {
    "language" : "en",
    "text" : "Hello World. -John",
    "lines" : [
      {
        "boundingBox":
        [ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
        "text":"Hello World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ],
    "words": [
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"Hello"
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ]
  }
}

but then in this paragraph我们看到，仅使用了 OCR 技能中的 text 字段，并且呈现了新的 contentOffset 字段。

自定义技能组定义:

{
  "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text",
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", 
          "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", 
          "targetName" : "merged_text"
        }
      ]
    }
  ]
}

输入应如下所示:

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "text": "The brown fox jumps over the dog",
        "itemsToInsert": ["quick", "lazy"],
        "offsets": [3, 28]
      }
    }
  ]
}

那么 offsets 数组(技能定义中的 contentOffset)是如何来自 OcrSkill 响应不返回该值并且 Read 计算机视觉方法没有从 API 中返回该方法？

最佳答案

contentOffset - 是从嵌入图像的文件中提取内容的默认功能。因此，只要 OCR 技能组识别出输入文档中包含的图像，就会调用 contentOffset。

要回答出现 contentOffset 数组的原因，是因为我们上传用于分析的每个输入中都有多个图像。请考虑 ReadAPI through REST 的以下文档遵循 JSON 操作。

关于azure - contentOffset 从哪里来？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72532077/

azure - contentOffset 从哪里来？

上一篇：Azure Devops 持续部署管道 "Job is pending..."错误，没有任何日志

下一篇：azure - 将 Azure 计算机视觉读取响应转换为 Azure 认知搜索中相关的 MergeText 技能