Try Adsterra Earnings, it’s 100% Authentic to make money more and more.
I am working on a web application that will serve as the help system for one of my companies existing products. One of the features I have implemented is a chatbot that is powered by an Azure Open AI instance (using GPT 4). When a user types a prompt in the chat window their prompt is seant to a cognitive search service and the content returned by that service is bundled with the prompt so the LLM can use that context to aid in responding to the prompt.
Overall this works quite well but there is a bit of a performance issue in that the responses can take upwards of 20 to 30 seconds to get a response. I know that Open AI supports a Streaming endpoint so my plan was to try and use that to see if that would at least have the chat feel more responsive while the LLM was generating the response. For context, the application I am working on is a React web application with an ASP.NET Core backend and I am using the pre-release Azure.AI.OpenAI C# library. Based on the references below I decided to try and use the GetChatCompletionsStreamingAsync method on the OpenAI client. However, when using that method I am not actually observing any difference in response times compared to the non-streaming GetChatCompletionsAsync method. I would expect that the streaming version of the API would return faster than the non-streaming because it should be returning an object which will stream subsequent results. Am I misunderstanding the purpose of the streaming API and/or am I using it incorrectly?
(I have seen this issue on multiple versions, the example code I provided most recently was running on 1.0.0-beta.5)
To help illustrate this problem I have created a .NET Console Application. Here is the Program.cs file:
// Program.CS // See https://aka.ms/new-console-template for more information using Azure.AI.OpenAI; using OpenAiTest; var _openAiPersonaPrompt = “You are Rick from Rick and Morty.”; var _openAiConsumer = new OpenAIConsumer(); var question = “Let’s go on a five minute adventure”; await PerformSynchronousQuestion(); await PerformAsynchronousQuestion(); async Task PerformSynchronousQuestion() { var messages = new List() { new ChatMessage(ChatRole.System, _openAiPersonaPrompt), new ChatMessage(ChatRole.User, question), }; var startTime = DateTime.Now; Console.WriteLine($”#### Starting at: {startTime}####”); var response = await _openAiConsumer.GenerateText(messages, false); var endTime = DateTime.Now; Console.WriteLine($”#### Ending at: {endTime}####”); Console.WriteLine($”#### Duration: {endTime.Subtract(startTime)}”); var completions = response.Value.Choices[0].Message.Content; Console.WriteLine(completions); } async Task PerformAsynchronousQuestion() { var messages = new List() { new ChatMessage(ChatRole.System, _openAiPersonaPrompt.ToString()), new ChatMessage(ChatRole.User, question), }; var startTime = DateTime.Now; Console.WriteLine($”#### Starting at: {startTime}####”); var response = await _openAiConsumer.GenerateTextStreaming(messages, false); var endTime = DateTime.Now; Console.WriteLine($”#### Ending at: {endTime}####”); Console.WriteLine($”#### Duration: {endTime.Subtract(startTime)}”); using var streamingChatCompletions = response.Value; var cancellationToken = new CancellationToken(); await foreach (var choice in streamingChatCompletions.GetChoicesStreaming()) { await foreach (var message in choice.GetMessageStreaming()) { if (message.Content == null) { continue; } Console.Write(message.Content); await Task.Delay(TimeSpan.FromMilliseconds(200)); } } }
Here is the OpenAIConumer wrapper I created. This was pulled out from the larger repo for the app I was working on so it’s unecessary for this proof of concept but I wanted to keep the separation in case that was the problem.
using Azure.AI.OpenAI; using Azure; namespace OpenAiTest { public class OpenAIConsumer { // Add your own values here to test private readonly OpenAIClient _client; private readonly string baseOpenAiUrl = “”; private readonly string openAiApiKey = “”; private readonly string _model = “”; public ChatCompletionsOptions Options { get; } public OpenAIConsumer() { var uri = new Uri(baseOpenAiUrl); var apiKey = new AzureKeyCredential(openAiApiKey); _client = new OpenAIClient(uri, apiKey); // Default set of options. We can add more configuration in the future if needed Options = new ChatCompletionsOptions() { MaxTokens = 1500, FrequencyPenalty = 0, PresencePenalty = 0, }; } ///
/// private void InitializeMessages(List messages) { Options.Messages.Clear(); foreach (var chatMessage in messages) { Options.Messages.Add(chatMessage); } } ///
/// List of messages including the user’s prompt /// See GetChatCompletionsAsync on the OpenAIClient object public async Task<Response> GenerateText(List messages, bool useAzureSearchAsDataSource) { InitializeMessages(messages); var result = await _client.GetChatCompletionsAsync(_model, Options); return result; } public async Task<Response> GenerateTextStreaming(List messages, bool useAzureSearchAsDataSource) { InitializeMessages(messages); var result = await _client.GetChatCompletionsStreamingAsync(_model, Options); return result; } } }
From the code above my expectation would be that the call to _openAiConsumer.GenerateText would take longer to return than _openAiConsumer.GenerateTextStreaming. However, what I am noticing is that they effectively return at the same time and all the second one does is loop over the stream of responses but it’s already full when it’s received.
Resources I have already used while investigating this problem:
Edit 10/10/23
I’m adding an excerpt here detailing what I’m observing that is causing confusion. To clarify, my assumption is that GetChatCompletionsStreamingAsync should return faster than GetChatCompletionsAsync. To clarify, the former should return faster because it is returning an object (StreamingChatCompletions) which can be used to “stream” the response as it is completed by OpenAI. My assumption is the latter should take longer because it returns the actual full response from OpenAI. However, I wrote the following method to show what I’m observing:
public async Task CompareMethods(List messages) { InitializeMessages(messages); var startTime = DateTime.Now; Console.WriteLine(“### Starting Sync ###”); await _client.GetChatCompletionsAsync(_model, Options); Console.WriteLine(“### Ending Sync ###”); var endTime = DateTime.Now; Console.WriteLine($”#### Duration: {endTime.Subtract(startTime)}”); startTime = DateTime.Now; Console.WriteLine(“### Starting Async ###”); await _client.GetChatCompletionsStreamingAsync(_model, Options, CancellationToken.None); Console.WriteLine(“### Ending Async ###”); endTime = DateTime.Now; Console.WriteLine($”#### Duration: {endTime.Subtract(startTime)}”); }
So in the above function I am simply calling the two methods assuming that the call to GetChatCompletionsAsync will take longer than the call to GetChatCompletionsStreamingAsyng. However it is not taking longer, here’s the output (obviously the times and relative differences change over time, but I would expect that the call to the Streaming function to take very little time compared to the non-Streaming one.
### Starting Sync ### ### Ending Sync ### #### Duration: 00:00:16.6944412 ### Starting Async ### ### Ending Async ### #### Duration: 00:00:14.6443387
Source
Published By
Latest entries
- allPost2025.01.24Fmr. Trump attorney: Supreme Court went ‘too far’ with immunity decision
- allPost2025.01.24Proposed GOP spending cuts a ‘shell game,’ says House top Budget Committee Dem
- allPost2025.01.24Bird flu raises price of eggs and poses threat to pets
- allPost2025.01.24Trump administration works to arrest migrants previously legally allowed in U.S.