Availability Group Heath Monitoring

blog-img1
28-06-2024
Written by Rob Risetto

Availability Group Heath Monitoring

Like most monitoring systems these days the SQLRDS monitor has a number of availability group (AG) health checks and alerts built in. However, what if you don’t have a monitoring system in place but you have deployed an availability group or BAG (if running Standard Edition).

Checking the health of the AG on a continuous basis is very important, otherwise, you may find your environment is not in a state to failover at that crucial moment.

You can still get alerts about your AG health and status, even if you are not using a full feature monitoring system like SQLRDS Monitor. This simple solution is based on the Database Mail component to send out alerts that are generated by :-

  • An SQL Agent job that runs an AG health check script every 15 minutes. The script will alert if the AG is not healthy or if the Log Send Queue is greater than a threshold (20000 KB).
  • SQL Server Alerts that checks for the change of AG roles i.e. a failover has occurred.

AG Health Script

Create a SQL Agent job on each server of the Availability Group configuration and run the job every 15 minutes. Configure the AG Health Check script below as a TSQL Job Step.

— Configure values

Declare @ToMail varchar(max) = ‘<Your Email Address>

Declare @DBMailprofile varchar(128) = ‘<Your Database Mail Profile Name>

— Standard Declares

Declare @ErrorMsg varchar(8000) = ”

Declare @QueryStr nvarchar(max) = ”

Declare @ag_replica_server varchar(128) = ”

Declare @database_name varchar(128) = ”

set nocount on

— Get Health Data

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,

ag_replica_role = CASE

WHEN ar_state.role_desc IS NULL THEN N’DISCONNECTED’

ELSE ar_state.role_desc

END, db_name(dr_state.database_id) as database_name,

dr_state.synchronization_state_desc, dr_state.is_suspended, dr_state.synchronization_health_desc,

dr_state.suspend_reason_desc, isnull(dr_state.secondary_lag_seconds,0) secondary_lag_seconds,

isnull(dr_state.log_send_queue_size,0) log_send_queue_size, isnull(dr_state.redo_queue_size,0) redo_queue_size

into #AGStatus

FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )

JOIN sys.dm_hadr_availability_replica_states AS ar_state  ON ar.replica_id = ar_state.replica_id)

JOIN sys.dm_hadr_database_replica_states dr_state on

ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id

where ar_state.is_local = 1 — check local server only

— used to put output in DBMail alert

Set @QueryStr= ‘SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,

ag_replica_role = CASE WHEN ar_state.role_desc IS NULL THEN N”DISCONNECTED” ELSE ar_state.role_desc END,

db_name(dr_state.database_id) as database_name, dr_state.synchronization_state_desc, dr_state.is_suspended,

dr_state.synchronization_health_desc, dr_state.suspend_reason_desc, isnull(dr_state.secondary_lag_seconds,0) secondary_lag_seconds,

isnull(dr_state.log_send_queue_size,0) log_send_queue_size, isnull(dr_state.redo_queue_size,0) redo_queue_size

FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )

JOIN sys.dm_hadr_availability_replica_states AS ar_state  ON ar.replica_id = ar_state.replica_id)

JOIN sys.dm_hadr_database_replica_states dr_state on ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id where ar_state.is_local = 1′

If exists (select * from #AGStatus where synchronization_health_desc <> ‘HEALTHY’ or synchronization_state_desc <> ‘SYNCHRONIZED’ or is_suspended = 1)

begin

   select top 1 @database_name = [database_name],  @ag_replica_server =  ag_replica_server

              from #AGStatus

   Set @ErrorMsg = ‘AG Issue Found – Databases on ‘ + @ag_replica_server + ‘ are NOT Healthy’

   EXEC msdb.dbo.sp_send_dbmail @profile_name = @DBMailprofile, @recipients = @ToMail, @subject = @ErrorMsg, @Query = @QueryStr, @attach_query_result_as_file = 0, @query_result_width = 1024

end

— Check log send queue size – if bigger than 20MB then alert

If exists (select * from #AGStatus where log_send_queue_size >= 20000  )

begin

   select top 1 @database_name = [database_name],  @ag_replica_server =  ag_replica_server

              from #AGStatus

   Set @ErrorMsg = ‘AG Issue Found – Databases on ‘ + @ag_replica_server + ‘ have a LOG SEND QUEUE >= 20000’

   EXEC msdb.dbo.sp_send_dbmail @profile_name = @DBMailprofile, @recipients = @ToMail, @subject = @ErrorMsg, @Query = @QueryStr , @attach_query_result_as_file = 0, @query_result_width = 1024

end

— End AG Health Check

SQL Server Alert Definitions

The following script will define 3 x SQL Server Alerts to determine if an AG role change has occurred or if data movement was suspended or resumed. The script also adds the Alerts to the specified SQL Operator for email notifications.

Note Database Mail, the Alerts and SQL Operator need to be configured on each server of the AG configuration.

— 1480 – AG Role Change (failover)

EXEC msdb.dbo.sp_add_alert

       @name = N’AG Role Change’,

       @message_id = 1480,

    @severity = 0,

    @enabled = 1,

    @delay_between_responses = 60,

    @include_event_description_in = 1;

GO

EXEC msdb.dbo.sp_add_notification

       @alert_name = N’AG Role Change’,

       @operator_name = N’<YourSQLOperator>‘,

       @notification_method = 1;

GO

— 35264 – AG Data Movement – Suspended

EXEC msdb.dbo.sp_add_alert

       @name = N’AG Data Movement – Suspended’,

       @message_id = 35264,

    @severity = 0,

    @enabled = 1,

   @delay_between_responses = 60,

    @include_event_description_in = 1;

GO

EXEC msdb.dbo.sp_add_notification

       @alert_name = N’AG Data Movement – Suspended’,

       @operator_name = N’<YourSQLOperator>‘,

       @notification_method = 1;

GO

— 35265 – AG Data Movement – Resumed

EXEC msdb.dbo.sp_add_alert

       @name = N’AG Data Movement – Resumed’,

       @message_id = 35265,

    @severity = 0,

    @enabled = 1,

    @delay_between_responses = 60,

    @include_event_description_in = 1;

GO

EXEC msdb.dbo.sp_add_notification

       @alert_name = N’AG Data Movement – Resumed’,

       @operator_name = N’<YourSQLOperator>‘,

       @notification_method = 1;

GO

Testimonials

Kind words from our valued customers

Testimonials Icon “We found the DBInsight Monitor tool to be a highly effective way of monitoring the performance of our mission critical servers. In particular it has provided us with detailed information about our SQL environment, including internal SQL memory pressure, blocking locks root cause and VM Snapshot backup failures due to overruns. This has allowed us to take a more proactive view to problem solving, and therefore a saving in time and other important resources.”

Jim Gordon, IT Manager, Deaf Services Queensland

Testimonials Icon “DB Insight has been utilised by our group of companies for several years. I cannot recommend them highly enough. They are fast, competent and professional. They have solved every problem we have encountered and the service is outstanding. We simply cannot speak highly enough of them. They communicate in laymen’s terms (Plain English), explain what they have done and their charges are very fair and reasonable. We have had no hesitation in recommending Rob and the team to other organisations in our industry. We wish Rob and the team all the best! Thank you sincerely for all your efforts.”

Craig Young, Director, Impact Financial Services, Sydney, NSW, Australia

Testimonials Icon “When we engaged DBInsight to perform a comprehensive SQL Server health check, we had great expectations. But then we got so much more than we expected. Expertise and attention to detail is second to none. Our system is so much better for it now.”

Martin Kammann, Owner, Autag Pty Ltd, Brisbane, QLD, Australia

Testimonials Icon “Thank the 7 gods we found this bloke. We have had a pile of crap dumped on our lap many times thanks to other incompetent IT providers and Rob has been the one to sort it out. In fact, he is considered the Grand Wizard of SQL by us. Rob, despite being a QLD’r, clearly does not sleep. We have had alerts at ridiculous hours and Rob has fixed them. No issue. If only we could sprinkle Rob Dust on all of our IT providers. Now nobody likes IT people because they speak a different language and send you big bills that you understand even less. Rob is unique in that he speaks English (IT perspective) and his bills are modest. In short, we like Rob. We like DB Insight. Use them.”

Craig Young, Director, Impact Financial Services, Sydney, NSW, Australia

Testimonials Icon “As a tech startup with a global market, we had to ensure that our database was scalable, secure and highly available. We engaged DBinsight as our virtual DBA and Rob has been absolutely brilliant. Using SQLRDS Monitoring tools, Rob and team are alerted to and respond to any issues before we are aware of them. I couldn’t recommend DBInsight more highly.
Whether you are a startup or a mature business, if you value your data, then you need to ensure that the right expertise is at hand always. Rob is the best SQL DBA technically, speaks in plain English (not tech speak) and is a pleasure to work with. Do your business a favour and call DBInsight.”

Aidan Gani, Chief Technology Officer, WorkApp Pty Ltd

Testimonials Icon “We have been relying on Rob from DBInsight to help with our Azure SQL Server database since September 2015.We initially engaged him when our database was at 60GB and we were experiencing high CPU/IO, whilst on the highest pricing tier offered by Azure SQL (at the time it was P3).
He quickly got our queries under control, but more remarkably, with his continued help we were able to remain on the same pricing tier as our client base more than doubled and our database grew to over 800GB.
Recently, just as it looked like we were going to have to scale up due to database size, Rob helped out again and reduced the database to less than half – thus allowing us to remain on the same pricing tier for the foreseeable future.
Rob has also provided SQL training to my team which was great in raising the overall awareness of what to consider when writing application code that will invariably require access to the database.
So Rob has saved us money – no doubt about it – but there is something more important and valuable to us – Rob provides us peace of mind – knowing that he is always there to help fix the database when things go wrong – to me, that is priceless.
Having Rob help out when required is a great solution for us, and in my opinion more effective and cost effective than an inhouse DBA. Rob gains experience from helping out a bunch of clients which means that he can always recommend new and improved ways that we could/should do things.
'To Rob, I say, thanks heaps for all your help over the years – you have kept us running.
Not surprisingly, I definitely and unequivocally recommend Rob and DBInsight to any organization with a database – be it, on-prem or cloud.”

Mike Van Radars, Inspect Real Estate

Testimonials Icon “Using the DBInsight monitoring service, has given us invaluable insights to our HADR environment. Along with the support provided by DBInsight it has given us the comfort and confidence to focus on further expanding our business.”

Rob Camilleri, CIO, Money3

Testimonials Icon “Going the extra mile”
“No matter how big or small the task, DBInsight have always shown a high level of professionalism and commitment to doing the task right. DBInsight address more than just the task at hand: going the extra mile to document and plan against possible issues that may arise in the future.”

Roy Adams, RACS Director

Testimonials Icon "The SQLRDS Plus service provides peace of mind that the fundamentals of our mission critical database services are expertly monitored and managed. Beyond this, access to a highly experienced team of DBA consultants has helped us improve application performance, maximise the use of our cloud resources, and enabled us to deliver a more responsive service to our customers."

David Perske, Director, Resolution Technology