azure - contentOffset 从哪里来?

标签 azure computer-vision ocr azure-cognitive-search azure-cognitive-services

我正在尝试了解 Azure 认知搜索中的技能组。我想构建一个 Ocr 支持的搜索,并尝试了解它是如何工作的。

例如documentationocr 技能 产生响应:

{
  "text": "Hello World. -John",
  "layoutText":
  {
    "language" : "en",
    "text" : "Hello World. -John",
    "lines" : [
      {
        "boundingBox":
        [ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
        "text":"Hello World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ],
    "words": [
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"Hello"
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ]
  }
}

but then in this paragraph我们看到,仅使用了 OCR 技能中的 text 字段,并且呈现了新的 contentOffset 字段。

自定义技能组定义:

{
  "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text",
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", 
          "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", 
          "targetName" : "merged_text"
        }
      ]
    }
  ]
}

输入应如下所示:

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "text": "The brown fox jumps over the dog",
        "itemsToInsert": ["quick", "lazy"],
        "offsets": [3, 28]
      }
    }
  ]
}

那么 offsets 数组(技能定义中的 contentOffset)是如何来自 OcrSkill 响应不返回该值并且 Read 计算机视觉方法没有从 API 中返回该方法?

最佳答案

contentOffset - 是从嵌入图像的文件中提取内容的默认功能。因此,只要 OCR 技能组识别出输入文档中包含的图像,就会调用 contentOffset

要回答出现 contentOffset 数组的原因,是因为我们上传用于分析的每个输入中都有多个图像。请考虑 ReadAPI through REST 的以下文档遵循 JSON 操作。

关于azure - contentOffset 从哪里来?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72532077/

相关文章:

Azure 虚拟网络网关 - 访问其他资源组中的资源

matlab - Matlab的三角函数中的重投影误差是如何计算的?可悲的是,文档没有给出数学公式

python - Tensorflow CTC 损失 : ctc_merge_repeated parameter

c# - DataCache.Increment 是否可用于 WindowsAzure 共享缓存?

azure - 如何在 Azure 数据工厂中的 csv 文件前面添加一行?

python - 自定义池化/反池化层的 Tensorflow Reshape 错误

iphone - 使用 Tesseract 进行 OCR 会导致 GetUTF8Text 方法出现内存泄漏

python - 如何用一条语句制作多张图像?

azure - 如何从azure devops yaml管道中的工件中查找runID

python - Tensorflow TFRecordDataset.map 错误