本文共 35265 字,大约阅读时间需要 117 分钟。
这篇文章介绍SQL Server的一个典型的应用案例,即如何利用Event Notification与Service Broker技术相结合来实现死锁信息自动收集系统。通过这个系统,我们可以全面把控SQL Server数据库环境中所有实例上发生的死锁详细信息,供我们后期分析和解决死锁场景。
当 SQL Server 中某组资源的两个或多个线程或进程之间存在循环的依赖关系时,但因互相申请被其他进程所占用,而不会释放的资源处于的一种永久等待状态,将会发生死锁。SQL Server服务自动死锁检查进程默认每5分钟跑一次,当死锁发生时,会选择一个代价较小的进程做为死锁牺牲品,以此来避免死锁导致更大范围的影响。被选择做为死锁牺牲品的进程会报告如下错误:
Msg 1205, Level 13, State 51, Line 8Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
如果进程间发生了死锁,对于用户业务系统,乃至整个SQL Server服务健康状况影响很大,轻者系统反应缓慢,服务假死;重者服务挂起,拒绝请求。那么,我们有没有一种方法可以完全自动、无人工干预的方式异步收集SQL Server系统死锁信息并远程保留死锁相关信息呢?这些信息包括但不仅限于:
死锁发生在哪些进程之间
各个进程执行的语句块是什么?死锁时,各个进程在执行哪条语句?
死锁的资源是什么?死锁发生在哪个数据库?哪张表?哪个数据页?哪个索引上?
死锁发生的具体时间点,包含语句块开始时间、语句执行时间等
用户进程使用的登录用户是什么?客户端驱动是什么?
…… 如此的无人值守的自动死锁收集系统,就是我们今天要介绍的应用案例分享:利用SQL Server的Event Notification与Service Broker建立自动死锁信息收集系统。在死锁自动收集系统介绍开始之前,先简要介绍下SQL Server Service Broker和Event Notification技术。
Service Broker是微软至SQL Server 2005开始集成到数据库引擎中的消息通讯组件,为 SQL Server提供队列和可靠的消息传递的能力,可以用来构建基于异步消息通讯为基础的应用程序。Service Broker既可用于单个 SQL Server 实例的应用程序,也可用于在多个实例间进行消息分发工作的应用程序。Service Broker使用TCP/IP端口在实例间交换消息,所包含的功能有助于防止未经授权的网络访问,并可以对通过网络发送的消息进行加密,以此来保证数据安全性。多实例之间使用Service Broker进行异步消息通讯的结构图如下所示(图片来自微软的官方文档):
Event Notification的中文名称叫事件通知,执行事件通知可对各种Transact-SQL数据定义语言(DDL)语句和SQL跟踪事件做出响应,采取的响应方式是将这些事件的相关信息发送到 Service Broker 服务。事件通知可以用来执行以下操作:
记录和检索发生在数据库上的更改或活动。
执行操作以异步方式而不是同步方式响应事件。
可以将事件通知用作替代DDL 触发器和SQL跟踪的编程方法。事件通知的信息媒介是以xml数据类型的信息传递给Service Broker服务,它提供了有关事件的发生时间、受影响的数据库对象、涉及的 Transact-SQL 批处理语句等详细信息。对于SQL Server死锁而言,可以使用Event Notification来跟踪死锁事件,来获取DEADLOCK_GRAPH XML信息,然后通过异步消息组件Service Broker发送到远端的Deadlock Center上的Service Broker队列,完成死锁信息收集到死锁中央服务。
在介绍完Service Broker和Event Notification以后,我们来看看死锁手机系统的整体架构图。在这个系统中,存在两种类型角色:我们定义为死锁客户端(Deadlock Client)和死锁中央服务(Deadlock Center)。死锁客户端发生死锁后,首先会将Deadlock Graph XML通过Service Broker发送给死锁中央服务,死锁中央服务获取到Service Broker消息以后,解析这个XML就可以拿到客户端的死锁相关信息,最后存放到本地日志表中,供终端客户查询和分析使用。最终的死锁收集系统架构图如下所示:
详细的死锁信息收集过程介绍如下:死锁客户端通过本地SQL Server的Event Notification捕获发生在该实例上的Deadlock事件,并在死锁发生以后将Deadlock Graph XML数据存放到Event Notification绑定的队列中,然后通过绑定在该队列上的存储过程自动触发将Deadlock Graph XML通过Service Broker异步消息通讯的方式发送到死锁中央服务。中央服务在接收到Service Broker消息以后,首先放入Deadlock Center Service Broker队列中,该队列绑定了消息自动处理存储过程,用来解析Deadlock Graph XML信息,并将死锁相关的详细信息存入到Deadlock Center的Log Table中。最后,终端用户可以直接对Log Table来查询和分析所有Deadlock Client上发生的死锁信息。通过这系列的过程,最终达到了死锁信息的自动远程存储、收集,以提供后期死锁场景还原和复盘,达到死锁信息可追溯,及时监控,及时发现的目的。
系统架构设计完毕后,接下来是系统的配置和搭建过程,首先看看Service Broker的配置。这个配置还是相对比较繁琐的,包含了以下步骤:
创建Service Broker数据库(假设数据库名为DDLCenter)并开启Service Broker选项
创建Service Broker队列的激活存储过程和相关表对象
创建Master数据库下的Master Key
创建传输层本地和远程证书
创建基于证书的用户登录
创建Service Broker端口并授权用户连接
创建DDLCenter数据库下的Master Key
创建会话层本地及远程证书
创建Service Broker组件所需要的对象,包括:Message Type、Contact、Queue、Service、Remote Service Binding、Route
以下的配置请在Deadlock Client SQL Server实例上操作。
-- Run script on client server to gather deadlock graph xmlUSE masterGO-- Create DatabaseIF DB_ID('DDLCenter') IS NULL CREATE DATABASE [DDLCenter];GO-- Change datbase to simple recovery modelALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAITGO-- Enable Service BrokerALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ONGO-- Change database Owner to saALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]GO
表[DDLCollector].[Deadlock_Traced_Records]:从Event Notification队里接收的消息会记录到该表中。
表[DDLCollector].[Send_Records]:Deadlock Client成功发送Service Broker消息记录 表[DDLCollector].[Error_Records]:记录发生异常情况时的信息。 存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]:Deadlock Client绑定到队里的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。 存储过程[DDLCollector].[UP_SendDeadlockMsg]:Deadlock Client发送异步消息给Deadlock Center,这个存储过程会被上面的激活存储过程调用。-- Run on Client InstanceUSE [DDLCenter]GO-- Create SchemaIF NOT EXISTS( SELECT TOP 1 * FROM sys.schemas WHERE name = 'DDLCollector')BEGIN EXEC('CREATE SCHEMA DDLCollector');ENDGO-- Create table to log Traced Deadlock RecordsIF OBJECT_ID('DDLCollector.Deadlock_Traced_Records', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Deadlock_Traced_Records]GOCREATE TABLE [DDLCollector].[Deadlock_Traced_Records]( [RowId] [BIGINT] IDENTITY(1,1) NOT NULL, [Processed_Msg] [xml] NULL, [Processed_Msg_CheckSum] INT, [Record_Time] [datetime] NOT NULL CONSTRAINT DF_Deadlock_Traced_Records_Record_Time DEFAULT(GETDATE()), CONSTRAINT PK_Deadlock_Traced_Records_RowId PRIMARY KEY (RowId ASC)) ON [PRIMARY]GO-- Create table to record deadlock graph xml sent successfully logIF OBJECT_ID('DDLCollector.Send_Records', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Send_Records]GOCREATE TABLE [DDLCollector].[Send_Records]( [RowId] [BIGINT] IDENTITY(1,1) NOT NULL, [Send_Msg] [xml] NULL, [Send_Msg_CheckSum] INT, [Record_Time] [datetime] NOT NULL CONSTRAINT DF_Send_Records_Record_Time DEFAULT(GETDATE()), CONSTRAINT PK_Send_Records_RowId PRIMARY KEY (RowId ASC)) ON [PRIMARY]GO-- Create table to record error info when exception occursIF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Error_Records]GOCREATE TABLE [DDLCollector].[Error_Records]( [RowId] [int] IDENTITY(1,1) NOT NULL, [Msg_Body] [xml] NULL, [Conversation_handle] [uniqueidentifier] NULL, [Message_Type] SYSNAME NULL, [Service_Name] SYSNAME NULL, [Contact_Name] SYSNAME NULL, [Record_Time] [datetime] NOT NULL CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()), [Error_Details] [nvarchar](4000) NULL, CONSTRAINT PK_Error_Records_RowId PRIMARY KEY (RowId ASC)) ON [PRIMARY]GOUSE [DDLCenter]GO-- Create Store Procedure to Send Deadlock Graph xml to Center ServerIF OBJECT_ID('DDLCollector.UP_SendDeadlockMsg', 'P') IS NOT NULL DROP PROC [DDLCollector].[UP_SendDeadlockMsg]GOCREATE PROCEDURE [DDLCollector].[UP_SendDeadlockMsg]( @DeadlockMsg XML)AS BEGIN SET NOCOUNT ON; DECLARE @handle UNIQUEIDENTIFIER ,@Proc_Name SYSNAME ,@Error_Details VARCHAR(2000) ; -- get the store procedure name SELECT @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) + '.' + QUOTENAME(OBJECT_NAME(@@PROCID)),'') FROM sys.procedures WHERE OBJECT_ID = @@PROCID ; BEGIN TRY -- Begin Dialog BEGIN DIALOG CONVERSATION @handle FROM SERVICE [http://soa/deadlock/service/ClientService] TO Service 'http://soa/deadlock/service/CenterService' ON CONTRACT [http://soa/deadlock/contract/CheckContract] ; -- Send deadlock graph xml as the message to Center Server SEND ON CONVERSATION @handle MESSAGE TYPE [http://soa/deadlock/MsgType/Request] (@DeadlockMsg); -- Log it successfully INSERT INTO [DDLCollector].[Send_Records]([Send_Msg], [Send_Msg_CheckSum]) VALUES( @DeadlockMsg, CHECKSUM(CAST(@DeadlockMsg as NVARCHAR(MAX)))) END TRY BEGIN CATCH -- Record the error info when exception occurs SET @Error_Details= ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ' Error Message : ' + ERROR_MESSAGE() + ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + ' Exception Proc: ' + @Proc_Name ; -- record into table INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@DeadlockMsg, @handle, 'http://soa/deadlock/MsgType/Request', 'http://soa/deadlock/service/ClientService', 'http://soa/deadlock/contract/CheckContract', @Error_Details); END CATCHENDGO-- Create Store Procedure for Queue: when extend event notification queue message-- this store procedure will be called.IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockEventMsg', 'P') IS NOT NULL DROP PROC [DDLCollector].[UP_ProcessDeadlockEventMsg]GOCREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockEventMsg]AS/*SELECT * FROM [DDLCollector].[Deadlock_Traced_Records]SELECT * FROM [DDLCollector].[Send_Records]SELECT * FROM [DDLCollector].[Error_Records]*/BEGIN SET NOCOUNT ON; DECLARE @handle UNIQUEIDENTIFIER , @Message_Type SYSNAME , @Service_Name SYSNAME , @Contact_Name SYSNAME , @Error_Details VARCHAR(2000) , @Message_Body XML , @Proc_Name SYSNAME ; -- Store Procedure Name SELECT @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) + '.' + QUOTENAME(OBJECT_NAME(@@PROCID)),'') FROM sys.procedures WHERE OBJECT_ID = @@PROCID ; BEGIN TRY -- Receive message from queue WAITFOR(RECEIVE TOP(1) @handle = conversation_handle , @Message_Type = message_type_name , @Service_Name = service_name , @Contact_Name = service_contract_name , @Message_Body = message_body FROM dbo.[http://soa/deadlock/queue/ClientQueue]),Timeout 500 ; -- just return if there is no message needed to process IF(@@Rowcount=0) BEGIN RETURN END -- Get data from message queue ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification' BEGIN -- Record message log first INSERT INTO [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum]) VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX)))) -- BE NOTED HERE: PLEASE DO'T END CONVERSATION, OR ELSE EXCEPTION WILL BE THROWN OUTPUT /* Error: 17001, Severity: 16, State: 1. Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{67419386-7C34-E711-A709-001C42099969}'. Error Code = '8429'. Error: 17005, Severity: 16, State: 1. Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active. */ --END CONVERSATION @handle --Here call another Store Procedure to send deadlock graph info to center server EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body; END --End Diaglog Message Type, that means we should end this conversation ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog' BEGIN END CONVERSATION @handle; END -- Konwn Service Broker Errors by System. ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' BEGIN END CONVERSATION @handle INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name); END ELSE -- unknown Message Types. BEGIN END CONVERSATION @handle INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name); -- unexpected message type RAISERROR (N' Received unknown message type: %s', 16, 1, @Message_Type) WITH LOG; END END TRY BEGIN CATCH BEGIN SET @Error_Details= ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ' Error Details : ' + ERROR_MESSAGE() + ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + ' Exception Proc: ' + @Proc_Name ; INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); END END CATCH END GO
USE masterGO-- If the master key is not available, create it. IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE name LIKE '%MS_DatabaseMasterKey%') BEGIN CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ClientMasterKey*'; END GO
这里请注意证书的开始生效时间要略微早于当前时间,并设置合适的证书过期日期,我这里是设置的过期日期为9999年12月30号。
USE masterGO-- Crete Transport Layer CertificationCREATE CERTIFICATE TrpCert_ClientLocalAUTHORIZATION dboWITH SUBJECT = 'TrpCert_ClientLocal',START_DATE = '05/07/2017',EXPIRY_DATE = '12/30/9999'GO-- then backup it up to local path-- and after that copy it to Center serverBACKUP CERTIFICATE TrpCert_ClientLocalTO FILE = 'C:\Temp\TrpCert_ClientLocal.cer';GO
这里的证书是通过证书文件来创建的,这个证书文件来自于远程通讯的另一端Deadlock Center SQL Server的证书文件的一份拷贝。
USE masterGO-- Create certification came from Center Server.CREATE CERTIFICATE TrpCert_RemoteCenter FROM FILE = 'C:\Temp\TrpCert_RemoteCenter.cer'GO
这里也可以创建带密码的常规用户登录,但是为了规避安全风险,这里最好创建基于证书文件的用户登录。
USE masterGO-- Create user loginIF NOT EXISTS(SELECT * FROM sys.syslogins WHERE name='SSBDbo')BEGIN CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_ClientLocal;ENDGO
这里需要注意的是,端口授权的证书一定本地实例创建的证书,而不是来自于远程服务器的那个证书。比如代码中的AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal部分。
USE masterGO --Creaet Tcp endpoint for SSB comunication and grant connect to users. CREATE ENDPOINT EP_SSB_ClientLocalSTATE = STARTED AS TCP ( LISTENER_PORT = 4022 ) FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal, ENCRYPTION = REQUIRED ) GO -- Grant Connect on Endpoint to User SSBDboGRANT CONNECT ON ENDPOINT::EP_SSB_ClientLocal TO SSBDbo GO
-- Now, let's go inside to conversation databaseUSE DDLCenterGO-- Create Master KeyIF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE name LIKE '%MS_DatabaseMasterKey%')BEGIN CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';ENDGO
USE DDLCenterGO-- Create conversation layer certificationCREATE CERTIFICATE DlgCert_ClientLocalAUTHORIZATION dboWITH SUBJECT = 'DlgCert_ClientLocal',START_DATE = '05/07/2017',EXPIRY_DATE = '12/30/9999'GO-- backup it up to local path-- and then copy it to remote Center serverBACKUP CERTIFICATE DlgCert_ClientLocalTO FILE = 'C:\Temp\DlgCert_ClientLocal.cer';GO
USE DDLCenterGO-- Create User for login under conversation databaseIF NOT EXISTS( SELECT TOP 1 * FROM sys.database_principals WHERE name = 'SSBDbo')BEGIN CREATE USER SSBDbo WITHOUT LOGIN;ENDGO
USE DDLCenterGO-- Create converstaion layer certification came from remote Center server.CREATE CERTIFICATE DlgCert_RemoteCenterAUTHORIZATION SSBDboFROM FILE='C:\Temp\DlgCert_RemoteCenter.cer'GOGRANT CONNECT TO SSBDbo;
Deadlock Client与Deadlock Center在创建Service Broker组件对象时存在差异:第一个差异是创建Service的时候,需要包含Event Notification的Contract,名称为
http://schemas.microsoft.com/SQL/Notifications/PostEventNotification;第二个差异是需要多创建一个指向本地服务的路由http://soa/deadlock/route/LocalRoute。USE DDLCenterGO-- Create Message TypeCREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request] VALIDATION = WELL_FORMED_XML;CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response] VALIDATION = WELL_FORMED_XML;GO-- Create ContactCREATE CONTRACT [http://soa/deadlock/contract/CheckContract]( [http://soa/deadlock/MsgType/Request] SENT BY INITIATOR, [http://soa/deadlock/MsgType/Response] SENT BY TARGET);GO-- Create QueueCREATE QUEUE dbo.[http://soa/deadlock/queue/ClientQueue] WITH STATUS = ON, RETENTION = OFF, ACTIVATION (STATUS = ON , PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockEventMsg] , MAX_QUEUE_READERS = 2 , EXECUTE AS N'dbo') GO-- Create Service-- Here is very import, we have to create service for both contacts-- to get extend event notification and SSB work.CREATE SERVICE [http://soa/deadlock/service/ClientService]ON QUEUE [http://soa/deadlock/queue/ClientQueue]( [http://soa/deadlock/contract/CheckContract], [http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);GO-- Grant Send on serviceGRANT SEND ON SERVICE::[http://soa/deadlock/service/ClientService] to SSBDbo;GO-- Create Remote Service BingdingCREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/CenterRSB]TO SERVICE 'http://soa/deadlock/service/CenterService' WITH USER = [SSBDbo],ANONYMOUS=OffGO-- Create RouteCREATE ROUTE [http://soa/deadlock/route/CenterRoute]WITH SERVICE_NAME = 'http://soa/deadlock/service/CenterService',ADDRESS = 'TCP://10.211.55.3:4024';GO-- Create route for the DeadlockNotificationSvcCREATE ROUTE [http://soa/deadlock/route/LocalRoute]WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',ADDRESS = 'LOCAL';GO
-- Run script on center server to receive client deadlock xmlUSE masterGO-- Create DatabaseIF DB_ID('DDLCenter') IS NULL CREATE DATABASE [DDLCenter];GO-- Change datbase to simple recovery modelALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAITGO-- Enable Service BrokerALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ONGO-- Change database Owner to saALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]GO
表[DDLCollector].[Collect_Records]:Deadlock Center成功接收到的Service Broker消息。
表[DDLCollector].[Error_Records]:记录发生异常情况的详细信息。 表[DDLCollector].[Deadlock_Info]:记录所有Deadlock Client端发生的Deadlock详细信息。 存储过程[DDLCollector].[UP_ProcessDeadlockGraphEventMsg]:Deadlock Center上绑定到队列的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。 存储过程[DDLCollector].[UP_ParseDeadlockGraphEventMsg]:Deadlock Center上解析Deadlock Graph XML的存储过程对象,这个存储过程会被上面的激活存储过程调用来解析XML,然后放入表[DDLCollector].[Deadlock_Info]中。USE [DDLCenter]GO-- Create SchemaIF NOT EXISTS( SELECT TOP 1 * FROM sys.schemas WHERE name = 'DDLCollector')BEGIN EXEC('CREATE SCHEMA DDLCollector');ENDGO-- Create table to log the received messageIF OBJECT_ID('DDLCollector.Collect_Records', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Collect_Records]GOCREATE TABLE [DDLCollector].[Collect_Records]( [RowId] [BIGINT] IDENTITY(1,1) NOT NULL, [Deadlock_Graph_Msg] [xml] NULL, [Deadlock_Graph_Msg_CheckSum] INT, [Record_Time] [datetime] NOT NULL CONSTRAINT DF_Collect_Records_Record_Time DEFAULT(GETDATE()), CONSTRAINT PK_Collect_Records_RowId PRIMARY KEY (RowId ASC)) ON [PRIMARY]GO-- create table to record the exception when error occursIF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Error_Records]GOCREATE TABLE [DDLCollector].[Error_Records]( [RowId] [int] IDENTITY(1,1) NOT NULL, [Msg_Body] [xml] NULL, [Conversation_handle] [uniqueidentifier] NULL, [Message_Type] SYSNAME NULL, [Service_Name] SYSNAME NULL, [Contact_Name] SYSNAME NULL, [Record_Time] [datetime] NOT NULL CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()), [Error_Details] [nvarchar](4000) NULL, CONSTRAINT PK_Error_Records_RowId PRIMARY KEY (RowId ASC)) ON [PRIMARY]GO-- create business table to record deadlock analysised infoIF OBJECT_ID('DDLCollector.Deadlock_Info', 'U') IS NOT NULL DROP TABLE [DDLCollector].[Deadlock_Info]GOCREATE TABLE [DDLCollector].[Deadlock_Info]( RowId INT IDENTITY(1,1) NOT NULL ,SQLInstance sysname NULL ,SPid INT NULL ,is_Vitim BIT NULL ,DeadlockGraph XML NULL ,DeadlockGraphCheckSum INT NULL ,lasttranstarted DATETIME NULL ,lastbatchstarted DATETIME NULL ,lastbatchcompleted DATETIME NULL ,procname SYSNAME NULL ,Code NVARCHAR(max) NULL ,LockMode sysname NULL ,Indexname sysname NULL ,KeylockObject sysname NULL ,IndexLockMode sysname NULL ,Inputbuf NVARCHAR(max) NULL ,LoginName sysname NULL ,Clientapp sysname NULL ,Action varchar(1000) NULL ,status varchar(10) NULL ,[Record_Time] [datetime] NOT NULL CONSTRAINT DF_Deadlock_Info_Record_Time DEFAULT(GETDATE()), CONSTRAINT PK_Deadlock_Info_RowId PRIMARY KEY (RowId ASC))GOUSE [DDLCenter]GO-- Create store procedure to analysis deadlock graph xml-- and log into business tableIF OBJECT_ID('DDLCollector.UP_ParseDeadlockGraphEventMsg', 'P') IS NOT NULL DROP PROC [DDLCollector].[UP_ParseDeadlockGraphEventMsg]GOCREATE PROCEDURE [DDLCollector].[UP_ParseDeadlockGraphEventMsg]( @DeadlockGraph_Msg XML)AS BEGIN SET NOCOUNT ON; ;WITH deadlock AS ( SELECT OwnerID = T.C.value('@id', 'varchar(50)') ,SPid = T.C.value('(./@spid)[1]','int') ,status = T.C.value('(./@status)[1]','varchar(10)') ,Victim = case when T.C.value('@id', 'varchar(50)') = T.C.value('./../../@victim','varchar(50)') then 1 else 0 end ,LockMode = T.C.value('@lockMode', 'sysname') ,Inputbuf = T.C.value('(./inputbuf/text())[1]','nvarchar(max)') ,Code = T.C.value('(./executionStack/frame/text())[1]','nvarchar(max)') ,SPName = T.C.value('(./executionStack/frame/@procname)[1]','sysname') ,Hostname = T.C.value('(./@hostname)[1]','sysname') ,Clientapp = T.C.value('(./@clientapp)[1]','varchar(1000)') ,lasttranstarted = T.C.value('(./@lasttranstarted)[1]','datetime') ,lastbatchstarted = T.C.value('(./@lastbatchstarted)[1]','datetime') ,lastbatchcompleted = T.C.value('(./@lastbatchcompleted)[1]','datetime') ,LoginName = T.C.value('@loginname', 'sysname') ,Action = T.C.value('(./@transactionname)[1]','varchar(1000)') FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/process-list/process') AS T(C) ) , keylock AS ( SELECT OwnerID = T.C.value('./owner[1]/@id', 'varchar(50)') ,KeylockObject = T.C.value('./../@objectname', 'sysname') ,Indexname = T.C.value('./../@indexname', 'sysname') ,IndexLockMode = T.C.value('./../@mode', 'sysname') FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/resource-list/keylock/owner-list') AS T(C) ) SELECT SQLInstance = A.Hostname ,A.SPid ,is_Vitim = A.Victim ,DeadlockGraph = @DeadlockGraph_Msg.query('EVENT_INSTANCE/TextData/deadlock-list') ,DeadlockGraphCheckSum = CHECKSUM(CAST(@DeadlockGraph_Msg AS NVARCHAR(MAX))) ,A.lasttranstarted ,A.lastbatchstarted ,A.lastbatchcompleted ,A.SPName ,A.Code ,A.LockMode ,B.Indexname ,B.KeylockObject ,B.IndexLockMode ,A.Inputbuf ,A.LoginName ,A.Clientapp ,A.Action ,status ,[Record_Time] = GETDATE() FROM deadlock AS A LEFT JOIN keylock AS B ON A.OwnerID = B.OwnerID ORDER BY A.SPid, A.Victim ;ENDGO-- Create store Procedure for Center server service queue to process deadlock xml-- when message sending from client server.IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockGraphEventMsg', 'P') IS NOT NULL DROP PROC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]GOCREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]AS/*EXEC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]SELECT * FROM [DDLCollector].[Collect_Records]SELECT * FROM [DDLCollector].[Error_Records]SELECT * FROM [DDLCollector].[Deadlock_Info]*/BEGIN SET NOCOUNT ON; DECLARE @handle UNIQUEIDENTIFIER , @Message_Type SYSNAME , @Service_Name SYSNAME , @Contact_Name SYSNAME , @Error_Details VARCHAR(2000) , @Message_Body XML , @Proc_Name SYSNAME ; -- Store Procedure name SELECT @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) + '.' + QUOTENAME(OBJECT_NAME(@@PROCID)),'') FROM sys.procedures WHERE OBJECT_ID = @@PROCID ; BEGIN TRY -- Receive deadlock message from service queue WAITFOR(RECEIVE TOP(1) @handle = conversation_handle , @Message_Type = message_type_name , @Service_Name = service_name , @Contact_Name = service_contract_name , @Message_Body = message_body FROM dbo.[http://soa/deadlock/queue/CenterQueue]),Timeout 500 ; IF(@@Rowcount=0) BEGIN RETURN END -- Message type is the very correct one ELSE IF @Message_Type = N'http://soa/deadlock/MsgType/Request' BEGIN -- Record message log first INSERT INTO [DDLCollector].[Collect_Records](Deadlock_Graph_Msg, [Deadlock_Graph_Msg_CheckSum]) VALUES(@Message_Body, CHECKSUM(cast(@Message_Body as NVARCHAR(MAX)))) END CONVERSATION @handle --Here call another Store Procedure to process our message to record deadlock relation info INSERT INTO [DDLCollector].[Deadlock_Info] EXEC [DDLCollector].[UP_ParseDeadlockGraphEventMsg] @Message_Body; END --End Diaglog Message Type, that means we should end this conversation ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog' BEGIN END CONVERSATION @handle; END -- Konwn Service Broker Errors by System. ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' BEGIN END CONVERSATION @handle INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name); END ELSE -- unknown Message Types. BEGIN END CONVERSATION @handle INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name); -- unexpected message type RAISERROR (N' Received unexpected message type: %s', 16, 1, @Message_Type) WITH LOG; END END TRY BEGIN CATCH BEGIN -- record exception record SET @Error_Details= ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) + ' Error Message : ' + ERROR_MESSAGE() + ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) + ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + ' Exception Proc: ' + @Proc_Name ; INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details]) VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); END END CATCH END GO
USE masterGO-- If the master key is not available, create it. IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE name LIKE '%MS_DatabaseMasterKey%') BEGIN CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'CenterMasterKey*'; END GO
USE masterGO-- Crete Transport Layer CertificationCREATE CERTIFICATE TrpCert_RemoteCenterAUTHORIZATION dboWITH SUBJECT = 'TrpCert_RemoteCenter',START_DATE = '05/07/2017',EXPIRY_DATE = '12/30/9999'GO-- then backup it up to local path-- and after that copy it to Client serverBACKUP CERTIFICATE TrpCert_RemoteCenterTO FILE = 'C:\Temp\TrpCert_RemoteCenter.cer';GO
USE masterGO-- Create certification came from client Server.CREATE CERTIFICATE TrpCert_ClientLocal FROM FILE = 'C:\Temp\TrpCert_ClientLocal.cer'GO
USE masterGO-- Create user loginIF NOT EXISTS(SELECT * FROM sys.syslogins WHERE name='SSBDbo')BEGIN CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_RemoteCenter;ENDGO
USE masterGO-- Creaet Tcp endpoint for SSB comunication and grant connect to users. CREATE ENDPOINT EP_SSB_RemoteCenterSTATE = STARTED AS TCP ( LISTENER_PORT = 4024) FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_RemoteCenter, ENCRYPTION = REQUIRED ) GO -- Grant Connect on Endpoint to User SSBDboGRANT CONNECT ON ENDPOINT::EP_SSB_RemoteCenter TO SSBDbo GO
-- Now, let's go inside to conversation databaseUSE DDLCenterGO-- Create Master KeyIF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE name LIKE '%MS_DatabaseMasterKey%')BEGIN CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';ENDGO
USE DDLCenterGO-- Create conversation layer certificationCREATE CERTIFICATE DlgCert_RemoteCenterAUTHORIZATION dboWITH SUBJECT = 'DlgCert_RemoteCenter',START_DATE = '05/07/2017',EXPIRY_DATE = '12/30/9999'GO-- backup it up to local path-- and then copy it to remote client serverBACKUP CERTIFICATE DlgCert_RemoteCenterTO FILE = 'C:\Temp\DlgCert_RemoteCenter.cer';GO
USE DDLCenterGO-- Create User for login under conversation databaseIF NOT EXISTS( SELECT TOP 1 * FROM sys.database_principals WHERE name = 'SSBDbo')BEGIN --CREATE USER SSBDbo FOR LOGIN SSBDbo; CREATE USER SSBDbo WITHOUT LOGIN;ENDGO
USE DDLCenterGO-- Create converstaion layer certification came from remote client server.CREATE CERTIFICATE DlgCert_ClientLocalAUTHORIZATION SSBDboFROM FILE='C:\Temp\DlgCert_ClientLocal.cer'GOGRANT CONNECT TO SSBDbo;
USE DDLCenterGO-- Create Message TypeCREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request] VALIDATION = WELL_FORMED_XML;CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response] VALIDATION = WELL_FORMED_XML;GO-- Create ContactCREATE CONTRACT [http://soa/deadlock/contract/CheckContract]( [http://soa/deadlock/MsgType/Request] SENT BY INITIATOR, [http://soa/deadlock/MsgType/Response] SENT BY TARGET);GO-- Create QueueCREATE QUEUE [dbo].[http://soa/deadlock/queue/CenterQueue] WITH STATUS = ON , RETENTION = OFF, ACTIVATION (STATUS = ON , PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockGraphEventMsg] , MAX_QUEUE_READERS = 3 , EXECUTE AS N'dbo') GO-- Create ServiceCREATE SERVICE [http://soa/deadlock/service/CenterService]ON QUEUE [http://soa/deadlock/queue/CenterQueue]( [http://soa/deadlock/contract/CheckContract]);GO-- Grant Send on service to User SSBDboGRANT SEND ON SERVICE::[http://soa/deadlock/service/CenterService] to SSBDbo;GO-- Create Remote Service BingdingCREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/ClientRSB]TO SERVICE 'http://soa/deadlock/service/ClientService' WITH USER = SSBDbo,ANONYMOUS=OffGO-- Create RouteCREATE ROUTE [http://soa/deadlock/route/ClientRoute]WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',ADDRESS = 'TCP://10.211.55.3:4022';GO
Event Notification只需要在Deadlock Client Server创建即可,因为只需要在Deadlock Client上跟踪死锁事件。在为Deadlock Client 配置Service Broker章节,我们已经为Event Notification创建了队列、服务和路由。因此,在这里我们只需要创建Event Notification对象即可。方法参见如下的代码:
USE DDLCenterGO-- Create Event Notification for the deadlock_graph event.IF EXISTS( SELECT * FROM sys.server_event_notifications WHERE name = 'DeadLockNotificationEvent')BEGIN DROP EVENT NOTIFICATION DeadLockNotificationEvent ON SERVER;ENDGOCREATE EVENT NOTIFICATION DeadLockNotificationEventON SERVERWITH FAN_INFOR DEADLOCK_GRAPHTO SERVICE 'http://soa/deadlock/service/ClientService', 'current database'GO
至此为止,所有对象和准备工作已经准备完成,万事俱备只欠东风,让我们在Deadlock Client实例上模拟死锁场景。首先,我们在Test数据库下创建两个测试表,表名分别为:dbo.test_deadlock1和dbo.test_deadlock2,代码如下:
IF DB_ID('Test') IS NULL CREATE DATABASE Test;GOUSE TestGO-- create two test tablesIF OBJECT_ID('dbo.test_deadlock1','u') IS NOT NULL DROP TABLE dbo.test_deadlock1GOCREATE TABLE dbo.test_deadlock1(id INT IDENTITY(1,1) not null PRIMARY KEY,name VARCHAR(20) null);IF OBJECT_ID('dbo.test_deadlock2','u') IS NOT NULL DROP TABLE dbo.test_deadlock2GOCREATE TABLE dbo.test_deadlock2(id INT IDENTITY(1,1) not null PRIMARY KEY,name VARCHAR(20) null);INSERT INTO dbo.test_deadlock1SELECT 'AA'UNION ALLSELECT 'BB';INSERT INTO dbo.test_deadlock2SELECT 'AA'UNION ALLSELECT 'BB';GO
接下来,我们使用SSMS打开一个新的连接,我们假设叫session 1,执行如下语句:
--session 1USE TestGOBEGIN TRAN UPDATE dbo.test_deadlock1SET name = 'CC'WHERE id = 1;WAITFOR DELAY '00:00:05'UPDATE dbo.test_deadlock2SET name = 'CC'WHERE id = 1;ROLLBACK
紧接着,我们使用SSMS打开第二个连接,假设叫Session 2,执行下面的语句:
--session 2USE TestGOBEGIN TRAN UPDATE dbo.test_deadlock2SET name = 'CC'WHERE id = 1;UPDATE dbo.test_deadlock1SET name = 'CC'WHERE id = 1;COMMIT
等待一会儿功夫以后,死锁发生,并且Session 2做为了死锁的牺牲品,我们会在Session 2的SSMS信息窗口中看到如下的死锁信息:
Msg 1205, Level 13, State 51, Line 8Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
根据上面的模拟死锁小节,说明死锁已经真真切切的发生了,那么,死锁信息到底有没有被捕获到呢?如果终端用户想要查看和分析所有客户端的死锁信息,只需要连接Deadlock Center SQL Server,执行下面的语句:
-- Run on Deadlock Center ServerUSE DDLCenterGOSELECT * FROM [DDLCollector].[Deadlock_Info]
由于结果集宽度太宽,人为将查询结果分两段截图,第一段结果集展示如下:
第二段结果集截图如下:
从这个结果集,我们可以清楚的看到Deadlock Client发生死锁的详细信息,包含:
死锁发生的Deadlock Client实例名称:CHERISH-PC
被死锁进程号60,死锁进程57号
死锁相关进程的事务开始时间,最后一个Batch开始执行时间和完成时间
死锁进程执行的代码和Batch语句
死锁发生时锁的类型
表和索引名称
死锁相关进程的登录用户
……
等等。当Deadlock Client 上SQL Server发生两次或者两次以上的Deadlock事件以后,自建的Event Notification对象(名为:DeadLockNotificationEvent)会被SQL Server系统自动删除,从而导致整个死锁收集系统无法工作。
SQL Server在错误日志中会抛出如下4个错误信息:两个错误编号为17004,一个编号为17001的错误,最后是一个编号为17005错误,其中17005明确说明了,Event Notification对象被删除了。如下:
Error: 17004, Severity: 16, State: 1.Event notification conversation on dialog handle '{4A6A0FBD-7A34-E711-A709-001C42099969}' closed without an error.Error: 17004, Severity: 16, State: 1.Event notification conversation on dialog handle '{476A0FBD-7A34-E711-A709-001C42099969}' closed without an error.Error: 17001, Severity: 16, State: 1.Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{F711A404-7934-E711-A709-001C42099969}'. Error Code = '8429'.Error: 17005, Severity: 16, State: 1.Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active.
错误日志截图如下:
从错误提示信息due to send time service broker errors来看,最开始花了很长时间来排查Service Broker方面的问题,在长达数小时的问题排查无果后,静下心来仔细想想:如果是Service Broker有问题的话,我们不可能完成第一、第二条死锁信息的收集,所以问题应该与Service Broker没有直接关系。于是,注意到了错误提示信息的后半部分Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active,再次以可以成功收集两条deadlock错误信息为由,排除Contact和Service的问题可能性,所以最有可能出问题的地方猜测应该是conversation handle,继续排查与conversation handle相关操作的地方,发现存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]的中的代码:
...ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification' BEGIN -- Record message log first INSERT INTO [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum]) VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX)))) END CONVERSATION @handle --Here call another Store Procedure to send deadlock graph info to center server EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body; END...
这个逻辑分支不应该有End Conversation的操作,因为这里是与Event Notification相关的Message Type操作,而不是Service Broker相关的Message Type操作。
问题分析清楚了,解决方法就非常简单了,注释掉这条语句END CONVERSATION @handle后,重新创建存储过程。再多次模拟死锁操作,再也没有出现Event Notification被系统自动删除的情况了,说明这个问题已经被彻底解决,坑已经被填上了。
解决问题的代码修改和注释如下截图,以此纪念下踩过的这个坑:以下是关于SQL Server死锁相关的系列文章,可以帮助我们全面了解、分析和解决死锁问题,其中第一个是这篇文章的视频演示。
使用Event Notification和Service Broker自动跟踪SQL Server Deadlock的Youku视频:
使用DBCC捕获死锁:
使用Profiler捕获死锁:
自动部署Profiler捕获死锁:
利用Service Broker事件通知捕获死锁:
利用Extended Events获取死锁信息:
这篇文章是一个完整的SQL Server死锁收集系统典型案例介绍,你甚至可以很轻松简单的将这个方案应用到你的产品环境,来收集产品环境所有SQL Server实例发生死锁的详细信息,并根据该系统收集到的场景来改进和改善死锁发生的概率,从而降低死应用发生异常错误的可能性。因此这篇文章有着非常重要的现实价值和意义。
转载地址:http://ljbgx.baihongyu.com/