python - 我正在尝试将索引推送到 Azurecognitive 搜索,但这会出现错误 ServiceRequestError : EOF occurred in violation of protocol (_ssl. c:2427)

标签 python azure azure-cognitive-search

我正在尝试将索引(带有嵌入)推送到 Azure 认知搜索。以下代码是将索引推送到认知搜索的代码:

 #Upload some documents to the index
    with open('index.json', 'r') as file:  
        documents = json.load(file)  
    search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
    result = search_client.upload_documents(documents, timeout = 50)  
    print(f"Uploaded {len(documents)} documents") 

只要“index.json”的大小很小,该代码就会起作用。 (已经尝试过,它成功地将数据推送到Azure认知搜索)。但只要“index.json”的大小很大,它就不起作用。现在我正在使用 69mb 的“index.json”。

运行代码时收到以下错误:

ServiceRequestError                       Traceback (most recent call last)
Cell In[21], line 5
      3     documents = json.load(file)  
      4 search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
----> 5 result = search_client.upload_documents(documents, timeout = 50)  
      6 print(f"Uploaded {len(documents)} documents") 

File /usr/local/lib/python3.11/site-packages/azure/search/documents/_search_client.py:543, in SearchClient.upload_documents(self, documents, **kwargs)
    540 batch.add_upload_actions(documents)
    542 kwargs["headers"] = self._merge_client_headers(kwargs.get("headers"))
--> 543 results = self.index_documents(batch, **kwargs)
    544 return cast(List[IndexingResult], results)

File /usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py:78, in distributed_trace..decorator..wrapper_use_tracer(*args, **kwargs)
     76 span_impl_type = settings.tracing_implementation()
     77 if span_impl_type is None:
---> 78     return func(*args, **kwargs)
     80 # Merge span is parameter is set, but only if no explicit parent are passed
     81 if merge_span and not passed_in_parent:

File /usr/local/lib/python3.11/site-packages/azure/search/documents/_search_client.py:641, in SearchClient.index_documents(self, batch, **kwargs)
    631 @distributed_trace
    632 def index_documents(self, batch: IndexDocumentsBatch, **kwargs: Any) -> List[IndexingResult]:
    633     """Specify a document operations to perform as a batch.
...
--> 381     raise error
    382 if _is_rest(request):
    383     from azure.core.rest._requests_basic import RestRequestsTransportResponse

ServiceRequestError: EOF occurred in violation of protocol (_ssl.c:2427)

有人知道如何修复此错误,因此代码确实会将数据推送到 Azure 认知搜索吗?

最佳答案

根据信息,我重现了该场景。 我已经测试了多个 Json 文件大小,似乎允许的最大限制正好低于 64MB 大小32000 文档(每个请求的索引操作)

一种可能的解决方案是在上传之前将数据分割成更小的 block 。

下面是上传代码的修改版本,它将数据分成每个 10000 个文档的 block :

with open('data2.json', 'r') as f:
    documents = json.load(f)

# Split the data into chunks 
chunks = [documents[i:i + 10000] for i in range(0, len(documents), 10000)]

# Upload the data
for chunk in chunks:
    result = search_client.upload_documents(chunk)
    print(f"Uploaded {len(chunk)} documents")

enter image description here

enter image description here

您可以根据您的文档和文件大小修改上述代码以获得最佳 block 。

关于python - 我正在尝试将索引推送到 Azurecognitive 搜索,但这会出现错误 ServiceRequestError : EOF occurred in violation of protocol (_ssl. c:2427),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77088537/

相关文章:

python - python 在计算机中如何存储字符串?

python - 使用 HTTPS 的 charles 代理检查 Python 请求

asp.net - 将 MVC 应用程序部署到 Azure 后无法加载 NewtonSoft.JSON

azure - VS Team Services 和 Azure 持续交付子目录

azure - 在 Azure 搜索中返回部分匹配项

azure - Blob 索引器不适用于 Azure 搜索服务

python - 向类添加动态函数; Python

azure - 有没有办法以编程方式将 YAML 文件存储在 Azure Key Vault 中?

Azure 搜索服务 - 从 Azure VM 上的 SQL Server 导入失败(“数据源负载应至少指定数据源名称和类型之一”)

javascript - 当用户喜欢帖子时如何动态更新 Flask 模板?