我正在尝试了解 Azure 认知搜索中的技能组。我想构建一个 Ocr 支持的搜索,并尝试了解它是如何工作的。

例如documentationocr 技能 产生响应:

  "text": "Hello World. -John",
    "language" : "en",
    "text" : "Hello World. -John",
    "lines" : [
        [ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
        "text":"Hello World."
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
    "words": [
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],

but then in this paragraph我们看到,仅使用了 OCR 技能中的 text 字段,并且呈现了新的 contentOffset 字段。


  "description": "Extract text from images and merge with content text to produce merged_text",
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
          "name": "image",
          "source": "/document/normalized_images/*"
      "outputs": [
          "name": "text"
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
          "source": "/document/content"
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
          "source": "/document/normalized_images/*/contentOffset"
      "outputs": [
          "name": "mergedText", 
          "targetName" : "merged_text"


  "values": [
      "recordId": "1",
        "text": "The brown fox jumps over the dog",
        "itemsToInsert": ["quick", "lazy"],
        "offsets": [3, 28]

那么 offsets 数组(技能定义中的 contentOffset)是如何来自 OcrSkill 响应不返回该值并且 Read 计算机视觉方法没有从 API 中返回该方法?


contentOffset - 是从嵌入图像的文件中提取内容的默认功能。因此,只要 OCR 技能组识别出输入文档中包含的图像,就会调用 contentOffset

要回答出现 contentOffset 数组的原因,是因为我们上传用于分析的每个输入中都有多个图像。请考虑 ReadAPI through REST 的以下文档遵循 JSON 操作。

