python - Weaviate 与 Azure 认知搜索相结合

这是我的场景:

客户端有一个 Azure SQL 数据库，其中包含包含人口统计信息的配置文件表。
我们创建了一个 Azure 认知搜索并为该数据库建立了索引，我们将所有字段连接到一个称为内容的字段中。因为根据文档，所有内容都需要在一个字段中。 https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/azure_cognitive_search

现在我们正在使用 LangChain 创建一个聊天机器人，我们可以在其中提出以下问题: 约翰·史密斯是谁？，简·史密斯多大了，谁喜欢园艺。

我找到的方法在这里: https://shweta-lodha.medium.com/integrating-azure-cognitive-search-with-azure-openai-and-langchain-51280d1026f2

基本上，首先查询认知搜索并返回一些文档，然后将这些文档作为向量保存在 ChromaDB 中，然后查询 ChromaDB 并使用 langchain 和 openAI 以简单的英语接收结果。

但是 ChromaDB 非常慢。这一步大约需要50秒。

所以我想尝试 weaviate，但后来我得到了非常奇怪的错误，例如:

[ERROR] Batch ConnectionError Exception occurred! Retrying in 2s. [1/3]
{'error': [{'message': "'@search.score' is not a valid property name. Property names in Weaviate are restricted to valid GraphQL names, which must be “/[_A-Za-z][_0-9A-Za-z]*/”., no such prop with name '@search.score' found in class 'LangChain_df32d6b6d10c4bb895db75f88aaabd75' in the schema. Check your schema files for which properties in this class are available"}]}

我的代码是这样的:

@timer
def from_documentsWeaviate(docs, embeddings):
     return Weaviate.from_documents(docs, embeddings, weaviate_url=WEAVIATE_URL, by_text=False)

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    embeddings = OpenAIEmbeddings(deployment=OPENAI_EMBEDDING_DEPLOYMENT_NAME, model=OPENAI_EMBEDDING_MODEL_NAME, chunk_size=1)
    user_input = get_text()   
    retriever = AzureCognitiveSearchRetriever(content_key="content")

    
   
    
    llm = AzureChatOpenAI(
        openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
        openai_api_version=OPENAI_API_VERSION ,
        deployment_name=OPENAI_DEPLOYMENT_NAME,
        openai_api_key=OPENAI_API_KEY,
        openai_api_type = OPENAI_API_TYPE ,
        model_name=OPENAI_MODEL_NAME,
        temperature=0)
    
    docs = get_relevant_documents(retriever, user_input)
    #vectorstore  = from_documentsChromaDb(docs=docs, embedding=embeddings)
    vectorstore  = from_documentsWeaviate(docs, embeddings)

我想知道是否应该首先索引表中的所有行并避免认知搜索部分。？

最佳答案

but then I get very weird errors like:

该错误意味着您的属性名称无效，例如@search.score 无效，因为它不符合此正则表达式:

/[_A-Za-z][_0-9A-Za-z]*/

I wonder if I should first index all rows from the table and avoid the cognitive search part.?

在我看来，Azure 认知搜索部分在此用例中有些过大，应替换为从 Azure SQL 获取行、将其组合到单个字段中并上传的管道。

关于python - Weaviate 与 Azure 认知搜索相结合，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76521370/

python - Weaviate 与 Azure 认知搜索相结合

上一篇：azure - Azure 函数的最大实例数

下一篇：azure - 在ADF中，使用摄取功能提取数据效果很好，但无法使用数据流提取数据。有什么建议么？