c# - 如何正确使用 Azure.AI.OpenAI.OpenAIClient.GetChatCompletionsStreamingAsync 方法

我正在开发一个 Web 应用程序，它将作为我公司现有产品之一的帮助系统。我实现的功能之一是由 Azure 开放 AI 实例(使用 GPT 4)提供支持的聊天机器人。当用户在聊天窗口中键入提示时，他们的提示将被定位到认知搜索服务，并且该服务返回的内容与提示捆绑在一起，以便法学硕士可以使用该上下文来帮助响应提示。

总体而言，这工作得很好，但存在一些性能问题，因为响应可能需要 20 到 30 秒才能获得响应。我知道 Open AI 支持流端点，所以我的计划是尝试使用它来看看在 LLM 生成响应时是否至少能让聊天感觉更灵敏。对于上下文，我正在开发的应用程序是一个带有 ASP.NET Core 后端的 React Web 应用程序，并且我正在使用预发布的 Azure.AI.OpenAI C# 库。根据下面的引用资料，我决定尝试在 OpenAI 客户端上使用 GetChatCompletionsStreamingAsync 方法。但是，在使用该方法时，与非流式 GetChatCompletionsAsync 方法相比，我实际上没有观察到响应时间有任何差异。我预计 API 的流式版本会比非流式返回更快，因为它应该返回一个将流式传输后续结果的对象。 我是否误解了流 API 的用途和/或我是否错误地使用了它？

(我在多个版本上都看到过这个问题，我最近提供的示例代码是在 1.0.0-beta.5 上运行的)

为了帮助说明这个问题，我创建了一个 .NET 控制台应用程序。这是 Program.cs 文件:

// Program.CS
// See https://aka.ms/new-console-template for more information
using Azure.AI.OpenAI;
using OpenAiTest;

var _openAiPersonaPrompt = "You are Rick from Rick and Morty.";
var _openAiConsumer = new OpenAIConsumer();
var question = "Let's go on a five minute adventure";
await PerformSynchronousQuestion();
await PerformAsynchronousQuestion();


async Task PerformSynchronousQuestion()
{
    var messages = new List<ChatMessage>()
            {
                new ChatMessage(ChatRole.System, _openAiPersonaPrompt),
                new ChatMessage(ChatRole.User, question),
            };
    var startTime = DateTime.Now;
    Console.WriteLine($"#### Starting at: {startTime}####");

    var response = await _openAiConsumer.GenerateText(messages, false);
    var endTime = DateTime.Now;
    Console.WriteLine($"#### Ending at: {endTime}####");
    Console.WriteLine($"#### Duration: {endTime.Subtract(startTime)}");
    var completions = response.Value.Choices[0].Message.Content;
    Console.WriteLine(completions);
}

async Task PerformAsynchronousQuestion()
{
    var messages = new List<ChatMessage>()
            {
                new ChatMessage(ChatRole.System, _openAiPersonaPrompt.ToString()),
                new ChatMessage(ChatRole.User, question),
            };
    var startTime = DateTime.Now;
    Console.WriteLine($"#### Starting at: {startTime}####");
    var response = await _openAiConsumer.GenerateTextStreaming(messages, false);

    var endTime = DateTime.Now;
    Console.WriteLine($"#### Ending at: {endTime}####");
    Console.WriteLine($"#### Duration: {endTime.Subtract(startTime)}");
    using var streamingChatCompletions = response.Value;
    var cancellationToken = new CancellationToken();
    await foreach (var choice in streamingChatCompletions.GetChoicesStreaming())
    {
        await foreach (var message in choice.GetMessageStreaming())
        {
            if (message.Content == null)
            {
                continue;
            }
             Console.Write(message.Content);
            await Task.Delay(TimeSpan.FromMilliseconds(200));
        }
    }
}

这是我创建的 OpenAIConumer 包装器。这是从我正在开发的应用程序的较大存储库中取出的，因此对于概念证明来说这是不必要的，但我想保持分离，以防出现问题。

using Azure.AI.OpenAI;
using Azure;


namespace OpenAiTest
{
    public class OpenAIConsumer
    {
        // Add your own values here to test
        private readonly OpenAIClient _client;
        private readonly string baseOpenAiUrl = "";
        private readonly string openAiApiKey = "";
        private readonly string _model = "";
        public ChatCompletionsOptions Options { get; }

        public OpenAIConsumer()
        {
            var uri = new Uri(baseOpenAiUrl);
            var apiKey = new AzureKeyCredential(openAiApiKey);
            _client = new OpenAIClient(uri, apiKey);

            // Default set of options. We can add more configuration in the future if needed
            Options = new ChatCompletionsOptions()
            {
                MaxTokens = 1500,
                FrequencyPenalty = 0,
                PresencePenalty = 0,
            };


        }

        /// <summary>
        /// Helper function that initializes the messages for the chat completion options
        /// Note that this will clear any existing messages
        /// </summary>
        /// <param name="messages"></param>
        private void InitializeMessages(List<ChatMessage> messages)
        {
            Options.Messages.Clear();
            foreach (var chatMessage in messages)
            {
                Options.Messages.Add(chatMessage);
            }
        }

        /// <summary>
        /// Wrapper around the GetCompletions API from the OpenAI service
        /// </summary>
        /// <param name="messages">List of messages including the user's prompt</param>
        /// <returns>See GetChatCompletionsAsync on the OpenAIClient object</returns>
        public async Task<Response<ChatCompletions>> GenerateText(List<ChatMessage> messages, bool useAzureSearchAsDataSource)
        {
            InitializeMessages(messages);
            var result = await _client.GetChatCompletionsAsync(_model, Options);
            return result;
        }

        public async Task<Response<StreamingChatCompletions>> GenerateTextStreaming(List<ChatMessage> messages, bool useAzureSearchAsDataSource)
        {
            InitializeMessages(messages);
            var result = await _client.GetChatCompletionsStreamingAsync(_model, Options);
            return result;
        }
    }
}

根据上面的代码，我的预期是，对 _openAiConsumer.GenerateText 的调用将比 _openAiConsumer.GenerateTextStreaming 花费更长的时间返回。但是，我注意到它们实际上同时返回，而第二个所做的只是循环响应流，但在收到响应时它已经满了。

我在调查此问题时已使用的资源:

Microsoft documentation on how to use response streaming
- 请注意，本文介绍的是如何使用您自己的 Azure 认知搜索索引。我的示例代码没有这样做，但是当我注意到同样的行为时，我认为我遇到的是我如何使用流代码的普遍问题
Tutorial on how to use the streaming API

编辑 10/10/23

我在这里添加了一段摘录，详细说明了我所观察到的导致困惑的情况。为了澄清这一点，我的假设是 GetChatCompletionsStreamingAsync 应比 GetChatCompletionsAsync 返回更快。澄清一下，前者应该返回得更快，因为它返回一个对象 (StreamingChatCompletions)，该对象可用于在 OpenAI 完成时“流式传输”响应。我的假设是后者应该需要更长的时间，因为它返回 OpenAI 的实际完整响应。但是，我编写了以下方法来展示我所观察到的内容:

public async Task CompareMethods(List<ChatMessage> messages)
{
    InitializeMessages(messages);
    var startTime = DateTime.Now;
    Console.WriteLine("### Starting Sync ###");
    await _client.GetChatCompletionsAsync(_model, Options);
    Console.WriteLine("### Ending Sync ###");
    var endTime = DateTime.Now;
    Console.WriteLine($"#### Duration: {endTime.Subtract(startTime)}");
    startTime = DateTime.Now;
    Console.WriteLine("### Starting Async ###");
    await _client.GetChatCompletionsStreamingAsync(_model, Options, CancellationToken.None);
    Console.WriteLine("### Ending Async ###");
    endTime = DateTime.Now;
    Console.WriteLine($"#### Duration: {endTime.Subtract(startTime)}");
}

因此，在上述函数中，我只是简单地调用这两个方法，假设对 GetChatCompletionsAsync 的调用将比对 GetChatCompletionsStreamingAsyng 的调用花费更长的时间。然而，它并没有花费更长的时间，这是输出(显然，时间和相对差异随着时间的推移而变化，但我希望与非流函数相比，对流函数的调用花费很少的时间。

### Starting Sync ###
### Ending Sync ###
#### Duration: 00:00:16.6944412
### Starting Async ###
### Ending Async ###
#### Duration: 00:00:14.6443387

最佳答案

这两个操作最终将花费相同的时间，因为它们在 openAi 端执行相同的工作。流式传输方法的不同之处在于，您会在响应 block 可用时接收它们。您没有误解流式传输方法的目的，但您没有正确使用它。

如果您希望您的 React 应用程序在消息部分可用时接收它们，您必须像 openAi api 将它们流式传输给您一样流式传输它们。为此，您可以直接提供返回内容类型的 HTTP GET 端点:text/event-stream 或使用 SignalR 流集线器方法。

这是使用 completions.GetChoicesStreaming() 返回的 IAsyncEnumerable 的简单示例。

[HttpGet]
public async Task StreamTestAsync([FromQuery] string prompt)
{
    Response.Headers.Add("Content-Type", "text/event-stream");
    var writer = new StreamWriter(Response.Body);

    var messages = new List<ChatMessage>()
    {
        new ChatMessage(ChatRole.System, "You are a helpful assistant."),
        new ChatMessage(ChatRole.User, prompt),
    };

    var options = new ChatCompletionsOptions(messages)
    {
        MaxTokens = 1500,
        FrequencyPenalty = 0,
        PresencePenalty = 0,
    };

    try
    {
        var startTime = DateTime.Now;
        Console.WriteLine("### Starting Async ###");

        StreamingChatCompletions completions = await openAIClient.GetChatCompletionsStreamingAsync("gpt-4", options);

        Console.WriteLine("### Ending Async ###");
        Console.WriteLine($"#### Duration: {DateTime.Now.Subtract(startTime)}");

        var choice = await completions.GetChoicesStreaming().FirstAsync();

        await foreach (var message in choice.GetMessageStreaming())
        {
            await writer.WriteAsync($"data: {message.Content}\n\n");
            await writer.FlushAsync();
        }
    }
    catch (Exception exception)
    {
        logger.LogError(exception, "Error while generating response.");
        await writer.WriteAsync("event: error\ndata: error\n\n");
    }
    finally
    {
        await writer.FlushAsync();
    }
}

“await openAIClient.GetChatCompletionsStreamingAsync”行对我来说不到一秒(约 500 毫秒)即可完成。虽然大部分工作发生在 await foreach 循环中，如果您提示较长的回复，则可能需要 10-20 秒。

在 react 方面，像这样的函数应该可以帮助您入门:

function createEventSourceConnection(prompt: string) {
    const eventSource = new EventSourcePolyfill(`yourAPI/promptstream?prompt=${prompt}`);

    eventSource.onopen = _ => console.log("EventSource opened.");

    eventSource.onmessage = event => {
      //do something with the event data
    };

    eventSource.onerror = error => {
      console.error("EventSource closed with error.");
    };
  }

关于c# - 如何正确使用 Azure.AI.OpenAI.OpenAIClient.GetChatCompletionsStreamingAsync 方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77261548/

c# - 如何正确使用 Azure.AI.OpenAI.OpenAIClient.GetChatCompletionsStreamingAsync 方法

编辑 10/10/23

上一篇：c# - 如何在 C# 中模拟来自 httpclient 的响应，Azure 函数，如何测试

下一篇：Python:导入包含包的实际模块对象，而不命名它