我正在尝试将索引(带有嵌入)推送到 Azure 认知搜索。以下代码是将索引推送到认知搜索的代码:
#Upload some documents to the index
with open('index.json', 'r') as file:
documents = json.load(file)
search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents, timeout = 50)
print(f"Uploaded {len(documents)} documents")
只要“index.json”的大小很小,该代码就会起作用。 (已经尝试过,它成功地将数据推送到Azure认知搜索)。但只要“index.json”的大小很大,它就不起作用。现在我正在使用 69mb 的“index.json”。
运行代码时收到以下错误:
ServiceRequestError Traceback (most recent call last)
Cell In[21], line 5
3 documents = json.load(file)
4 search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
----> 5 result = search_client.upload_documents(documents, timeout = 50)
6 print(f"Uploaded {len(documents)} documents")
File /usr/local/lib/python3.11/site-packages/azure/search/documents/_search_client.py:543, in SearchClient.upload_documents(self, documents, **kwargs)
540 batch.add_upload_actions(documents)
542 kwargs["headers"] = self._merge_client_headers(kwargs.get("headers"))
--> 543 results = self.index_documents(batch, **kwargs)
544 return cast(List[IndexingResult], results)
File /usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py:78, in distributed_trace..decorator..wrapper_use_tracer(*args, **kwargs)
76 span_impl_type = settings.tracing_implementation()
77 if span_impl_type is None:
---> 78 return func(*args, **kwargs)
80 # Merge span is parameter is set, but only if no explicit parent are passed
81 if merge_span and not passed_in_parent:
File /usr/local/lib/python3.11/site-packages/azure/search/documents/_search_client.py:641, in SearchClient.index_documents(self, batch, **kwargs)
631 @distributed_trace
632 def index_documents(self, batch: IndexDocumentsBatch, **kwargs: Any) -> List[IndexingResult]:
633 """Specify a document operations to perform as a batch.
...
--> 381 raise error
382 if _is_rest(request):
383 from azure.core.rest._requests_basic import RestRequestsTransportResponse
ServiceRequestError: EOF occurred in violation of protocol (_ssl.c:2427)
有人知道如何修复此错误,因此代码确实会将数据推送到 Azure 认知搜索吗?
最佳答案
根据信息,我重现了该场景。
我已经测试了多个 Json 文件大小,似乎允许的最大限制正好低于 64MB 大小
和 32000 文档(每个请求的索引操作)
。
一种可能的解决方案是在上传之前将数据分割成更小的 block 。
下面是上传代码的修改版本,它将数据分成每个 10000 个文档的 block :
with open('data2.json', 'r') as f:
documents = json.load(f)
# Split the data into chunks
chunks = [documents[i:i + 10000] for i in range(0, len(documents), 10000)]
# Upload the data
for chunk in chunks:
result = search_client.upload_documents(chunk)
print(f"Uploaded {len(chunk)} documents")
您可以根据您的文档和文件大小修改上述代码以获得最佳 block 。
关于python - 我正在尝试将索引推送到 Azurecognitive 搜索,但这会出现错误 ServiceRequestError : EOF occurred in violation of protocol (_ssl. c:2427),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77088537/