c# - Azure Kubernetes .NET Core 应用程序到 Azure SQL 数据库间歇性错误 258

标签 c# sql-server azure kubernetes timeout

我们正在 Kubernetes 集群中运行 .NET Core 3.1 应用程序。该应用程序使用 EF Core 3.1.7 和 Microsoft.Data.SqlClient 1.1.3 连接到 Azure SQL 数据库。

在看似随机的时间,我们会收到以下错误。

 ---> System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
   at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.ExecuteScalar()

尽管它看起来是随机的,但在较重的负载下它肯定会更频繁地发生。根据我的研究,这个特定超时似乎与连接超时有关,而不是与命令超时有关。 IE。客户端根本无法建立连接。这不是一个超时的查询。

我们已消除的潜在根本原因:

  • Azure SQL Server 容量:无论我们在 4 个还是 16 个 vCPU 上运行,都会观察到该行为。 Azure 支持还确认日志中没有问题。这包括打开的连接数量,该数量仅为 50 个左右。我们还从其他连接运行了负载测试,服务器运行良好。
  • Microsoft.Data.SqlClient 版本:我们一直在版本 1.1.3 上运行,此行为仅在一周前 (2021-03-16) 开始。
  • 网络容量:现阶段我们的最大速度约为 1-2MB/s,这相当缓慢。
  • Kubernetes 扩展:事件的发生与我们何时扩展更多 Pod 之间没有关联。
  • 连接字符串问题:我们的系统过去运行良好,但无论如何,我们更改了其他文章中提到的一些设置,看看问题是否无法自行解决。火星已禁用。我们无法禁用连接池。我们将 TrusServerCertificate 设置为 true。这是当前的连接字符串:Server=tcp:***.database.windows.net,1433;Initial Catalog=***;Persist Security Info=False;User ID=***;Password=** *;MultipleActiveResultSets=False;Encrypt=True;连接超时=60;TrustServerCertificate=True;

更新 1: 根据要求,提供了刚刚发生的两次超时的示例。由于是周日,所以客流量非常少。数据库利用率(CPU、Mem、IO)介于 2-6% 之间。

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
   at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
   at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
   at Microsoft.Data.SqlClient.SqlConnection.BeginDbTransaction(IsolationLevel isolationLevel)
   at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransaction(IsolationLevel isolationLevel)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)

使用此命令时,我们的数据库运行状况检查器也收到错误:Microsoft.EntityFrameworkCore.Infrastruct.DatabaseFacade.CanConnect()

上面的堆栈跟踪是我们试图解决的问题,而下面的堆栈跟踪是 SQL 查询超时的问题。

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)

最佳答案

问题是 Azure 的基础设施问题。 .

There is a known issue within Azure Network where the dhcp lease is lost whenever a disk attach/detach happens on some VM fleets. There is a fix rolling out at the moment to regions. I'll check to see when Azure Status update will be published for this.

问题消失了,因此修复似乎已在全局范围内推出。

对于将来遇到此问题的其他人,您可以通过建立 SSH connection into the node 来识别它。 (不是 Pod)。执行 ls -al/var/log/ 并识别所有 syslog 文件,并对每个文件运行以下 grep。

cat /var/log/syslog | grep 'carrier'

如果日志中存在任何失去运营商获得运营商 消息,则存在某种网络问题。在我们的例子中,它是 DHCP 租约。

enter image description here

关于c# - Azure Kubernetes .NET Core 应用程序到 Azure SQL 数据库间歇性错误 258,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66721744/

相关文章:

C# 更新 Access 数据库中的 boolean 值

c# - identitysever和AspNetUsers

Azure Landing Zone Bicep Accelerator - 解释 .env 文件的值

azure - 当我进行部署槽交换时,为什么 azure 会重新启动网站?

c# - 对象未沿所有执行路径放置

c# - 从验证器覆盖 http 状态代码

sql-server - 使用 TYPE(在 sql server 2008 中)命令将缓冲区大小限制为 255

sql-server - 如何在 Powershell 中实际取消设置(设置为 Nothing)COM 对象属性?

sql - 了解 SQL Server 中数据库所有表之间的关系

azure - 如何获取我的订阅中显示 CPU 核心的 Azure VMS 列表?