Why did my SQL Server database just go offline?

Note: This post contains a video clip where I show quirky behaviour of the scripted offline task. Personally I rarely use SQL Server Management Studio GUI to do day-to-day tasks, I rather just load a window and start typing stuff (I learn better that way). Recently I became lazy and just loaded up the tool and found something quite quirky, let me explain.

Continue reading

Advertisements

TDE: Looking inside your database and backups

TDE (Transparent Data Encryption) encrypts the data files at rest but don’t forget that it also encrypts your backup file too. I fancied looking inside the MDF and a backup of my database before enabling TDE and comparing it to when enabled.

Continue reading

SQL Server Target vs Total Memory

For this blog post I want to discuss the meaning behind SQL Server: Memory Manager\Target Server Memory (KB) and SQL Server: Memory Manager \Total memory (KB) Perfmon counters.  I will mention how under different situations and configuration settings their behaviour changes.

Continue reading

Lightweight Statistics Profiling (with a potential bug)

The great thing about new versions of SQL Server is the fact that they are packed full of new features but there some that don’t really get talked about much. I found out about a concept called lightweight query execution statistics profiling infrastructure – think of this as a way of capturing your “in-flight” query execution information, I did however encounter an issue which you will see later on.

Continue reading

Interactive query experience for SQL databases

This is a new feature where you can issue TSQL commands via a TSQL editor straight from the Azure portal, why this is great is because sometimes you do not want to keep switching between your SSMS (SQL Server Management Studio) connection and the Azure portal to issue basic commands and that assumes that you have the right version of tools installed and correct ports opened locally.

This is in public preview and you can do the following:

  • Query dynamic management views for real-time workload insights (Which is what I will be doing).
  • Issue ad-hoc queries.
  • Manage your user authentication.

To access the editor do the following: Go to your SQL database select the Tools button and then select the Query editor button.

tools1.JPG

Here you will see the Query editor option.

tool2.JPG

You will need to login, I have only SQL authentication setup but if you use Active Directory then you have the choice to use the other options.

tools3.JPG]

If you fail you will obviously be told.

error1.JPG

You want to see a blue authentication image!

auth.JPG

Now let’s issue a query – instead of switching over to my local SSMS connection. How about some database level wait stats? I really need to look into the IO_completion wait type!

waits101.JPG

Then if you really want you can save your query (which saves it to your local machine as a SQL file):

open.JPG

Saved locally as per below.

local.JPG

Or you can open a SQL file locally (Open query option) and let that be imported into your Azure portal session, for example you could have the waits stats script pre-written and just open it up whenever you like.

It will not create a new window, it basically overwrites the contents of the current open query window, I can work with that.

 

 

Using Extended Events Telemetry Session

You are probably wondering what this is, quite honestly I was wondering too. Well it is an extended events session known as telemetry_xevents that is shipped with SQL Server 2016. This is not the same as the system_health session that we have come to know and love, it captures very different information.

There are a lot of queries within this blog post but I think you will find them really useful because getting practical with it is the best way to show everyone what it is all about.

You can see the telemetry_xevents session via SSMS (SQL Server Management Studio).

tele

The script definition of telemetry_xevents is massive but I think it is important to show what it captures.

CREATE EVENT SESSION [telemetry_xevents] ON SERVER
ADD EVENT qds.query_store_db_diagnostics,
ADD EVENT sqlserver.always_encrypted_query_count,
ADD EVENT sqlserver.auto_stats,
ADD EVENT sqlserver.column_store_index_build_low_memory,
ADD EVENT sqlserver.column_store_index_build_throttle,
ADD EVENT sqlserver.columnstore_delete_buffer_flush_failed,
ADD EVENT sqlserver.columnstore_delta_rowgroup_closed,
ADD EVENT sqlserver.columnstore_index_reorg_failed,
ADD EVENT sqlserver.columnstore_log_exception,
ADD EVENT sqlserver.columnstore_rowgroup_merge_failed,
ADD EVENT sqlserver.columnstore_tuple_mover_delete_buffer_truncate_timed_out,
ADD EVENT sqlserver.columnstore_tuple_mover_end_compress,
ADD EVENT sqlserver.data_masking_ddl_column_definition,
ADD EVENT sqlserver.data_masking_traffic,
ADD EVENT sqlserver.data_masking_traffic_masked_only,
ADD EVENT sqlserver.database_cmptlevel_change,
ADD EVENT sqlserver.database_created,
ADD EVENT sqlserver.database_dropped,
ADD EVENT sqlserver.error_reported(
WHERE ([severity]>=(16) OR ([error_number]=(18456) OR [error_number]=(17803) OR [error_number]=(701) OR [error_number]=(802) OR [error_number]=(8645) OR [error_number]=(8651) OR [error_number]=(8657) OR [error_number]=(8902) OR [error_number]=(41354) OR [error_number]=(41355) OR [error_number]=(41367) OR [error_number]=(41384) OR [error_number]=(41336) OR [error_number]=(41309) OR [error_number]=(41312) OR [error_number]=(41313)))),
ADD EVENT sqlserver.json_function_compiled(
ACTION(sqlserver.database_id)),
ADD EVENT sqlserver.missing_column_statistics,
ADD EVENT sqlserver.missing_join_predicate,
ADD EVENT sqlserver.natively_compiled_module_inefficiency_detected,
ADD EVENT sqlserver.natively_compiled_proc_slow_parameter_passing,
ADD EVENT sqlserver.query_memory_grant_blocking,
ADD EVENT sqlserver.reason_many_foreign_keys_operator_not_used,
ADD EVENT sqlserver.rls_query_count,
ADD EVENT sqlserver.sequence_function_used(
ACTION(sqlserver.database_id)),
ADD EVENT sqlserver.server_memory_change,
ADD EVENT sqlserver.server_start_stop,
ADD EVENT sqlserver.stretch_database_disable_completed,
ADD EVENT sqlserver.stretch_database_enable_completed,
ADD EVENT sqlserver.stretch_database_reauthorize_completed,
ADD EVENT sqlserver.stretch_index_reconciliation_codegen_completed,
ADD EVENT sqlserver.stretch_remote_column_execution_completed,
ADD EVENT sqlserver.stretch_remote_column_reconciliation_codegen_completed,
ADD EVENT sqlserver.stretch_remote_index_execution_completed,
ADD EVENT sqlserver.stretch_table_alter_ddl,
ADD EVENT sqlserver.stretch_table_codegen_completed,
ADD EVENT sqlserver.stretch_table_create_ddl,
ADD EVENT sqlserver.stretch_table_data_reconciliation_results_event,
ADD EVENT sqlserver.stretch_table_hinted_admin_delete_event,
ADD EVENT sqlserver.stretch_table_hinted_admin_update_event,
ADD EVENT sqlserver.stretch_table_predicate_not_specified,
ADD EVENT sqlserver.stretch_table_predicate_specified,
ADD EVENT sqlserver.stretch_table_query_error,
ADD EVENT sqlserver.stretch_table_remote_creation_completed,
ADD EVENT sqlserver.stretch_table_row_migration_results_event,
ADD EVENT sqlserver.stretch_table_row_unmigration_results_event,
ADD EVENT sqlserver.stretch_table_unprovision_completed,
ADD EVENT sqlserver.stretch_table_validation_error,
ADD EVENT sqlserver.string_escape_compiled(
ACTION(sqlserver.database_id)),
ADD EVENT sqlserver.temporal_ddl_period_add,
ADD EVENT sqlserver.temporal_ddl_period_drop,
ADD EVENT sqlserver.temporal_ddl_schema_check_fail,
ADD EVENT sqlserver.temporal_ddl_system_versioning,
ADD EVENT sqlserver.temporal_dml_transaction_fail,
ADD EVENT sqlserver.window_function_used(
ACTION(sqlserver.database_id)),
ADD EVENT sqlserver.xtp_alter_table,
ADD EVENT sqlserver.xtp_db_delete_only_mode_updatedhktrimlsn,
ADD EVENT sqlserver.xtp_stgif_container_added,
ADD EVENT sqlserver.xtp_stgif_container_deleted,
ADD EVENT XtpCompile.cl_duration,
ADD EVENT XtpEngine.parallel_alter_stats,
ADD EVENT XtpEngine.serial_alter_stats,
ADD EVENT XtpEngine.xtp_db_delete_only_mode_enter,
ADD EVENT XtpEngine.xtp_db_delete_only_mode_exit,
ADD EVENT XtpEngine.xtp_db_delete_only_mode_update,
ADD EVENT XtpEngine.xtp_physical_db_restarted
ADD TARGET package0.ring_buffer(SET occurrence_number=(100))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=120 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

As you can see it is very different when comparing it to the system health session and more importantly there is no XEL file target, querying the ring buffer is quite different.

To see things at a high level you would run the following basic query.


DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
qp.query('.') AS event_data
FROM @ShredMe.nodes('RingBufferTarget/event') AS q(qp);

tele1

It’s much better if we can shred the XML to be more specific. I gave it a go using XQuery because I had a lot of the auto_stats events.

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp logged],
QP.value('(data[@name="database_id"]/value)[1]', 'INT') as [DatabaseID],
QP.value('(data[@name="object_id"]/value)[1]', 'INT') as [ObjectID],
QP.value('(data[@name="index_id"]/value)[1]', 'INT') as [IndexID],
QP.value('(data[@name="job_type"]/text)[1]', 'VARCHAR(MAX)') as [Job Type],
QP.value('(data[@name="statistics_list"]/value)[1]', 'VARCHAR(MAX)') as [Stats List]
FROM @ShredMe.nodes('RingBufferTarget/event[@name=''auto_stats'']') AS q(QP);
GO

statsxe

Looking at the above you can see that you have the ability to get a time when a stats object has been created,loaded and/or updated.

I think there is great information available here, I started to dig in more.

Example 1

I am going to find database creation times. I created couple of databases and then issued the following query:

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="database_name"]/value)[1]', 'varchar(20)') as [DatabaseName],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp created]

FROM @ShredMe.nodes('RingBufferTarget/event[@name=''database_created'']') AS q(QP);

xedb

Example 2

How about finding when the compatibility level of a database changed?

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="database_id"]/value)[1]', 'INT') as [DatabaseID],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp of change],
QP.value('(data[@name="previous_value"]/value)[1]', 'INT') as [Previous cmptlevel],
QP.value('(data[@name="new_value"]/value)[1]', 'INT') as [Newcmptlevel]

FROM @ShredMe.nodes('RingBufferTarget/event[@name=''database_cmptlevel_change'']') AS q(QP);

teledb1

Example 3

Finding the times when a database was dropped.


DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="database_name"]/value)[1]', 'varchar(20)') as [DatabaseName],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp when dropped]
FROM @ShredMe.nodes('RingBufferTarget/event[@name=''database_dropped'']') AS q(QP);

tele2

Example 4

Memory change tracking.

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="memory_change"]/text)[1]', 'varchar(256)') as [Memory Change Desc],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp changed],
QP.value('(data[@name="new_memory_size_mb"]/value)[1]', 'int') as [New Memory size(MB)]
FROM @ShredMe.nodes('RingBufferTarget/event[@name=''server_memory_change'']') AS q(QP);

2016ram

After playing with the memory settings I then received the below:

ram2.JPG

Example 5

Checking if someone created a temporal table, then you can use the ID (which I labelled it as TableId) and find it in sys.tables.

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="database_id"]/value)[1]', 'INT') as [DatabaseID],
QP.value('(data[@name="table_id"]/value)[1]', 'BIGINT') as [TableID],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp created temporal DDL],
QP.value('(data[@name="period_type"]/text)[1]', 'varchar(20)') as [Type]
FROM @ShredMe.nodes('RingBufferTarget/event[@name=''temporal_ddl_period_add'']') AS q(QP);

versions.JPG

As mentioned you can then confirm the name of the table via the below query:

select * from sys.tables
where object_id = 645577338

temptable

Example 6

Finally I check for error messages greater than level 16.

DECLARE @ShredMe XML;
SELECT @ShredMe = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name = N'telemetry_xevents';

SELECT
QP.value('(data[@name="severity"]/value)[1]', 'INT') as [severity level],
QP.value('(@timestamp)[1]', 'datetime2') AS [timestamp logged],
QP.value('(data[@name="message"]/value)[1]', 'VARCHAR(MAX)') as [Message]
FROM @ShredMe.nodes('RingBufferTarget/event[@name=''error_reported'']') AS q(QP);
GO

errors.JPG

The queries you can build to get information are endless.

As a side not I did get missing join predicate warnings but the XML didn’t really return anything for me to query.

Do not forget about the drawbacks. The ring buffer isn’t perfect, if anything I found it quite annoying.  It has a maximum amount of memory in kilobytes (KB) to use that being 4096 and 100 occurrences which means the preferred number of events of each type to keep in the buffer. After a restart the information will be lost.

Looking at wait stats differently with SQL 2016

Wait statistics is a great place to start looking into performance issues. Normally people would issue commands/queries based on  sys.dm_os_wait_stats and/or sys.dm_os_waiting_tasks. Did you know from SQL 2016 you have the ability to work with wait stats at a session level? This is done via sys.dm_exec_session_waits_stats. As stated by Microsoft documentation the “DMV resets the information for a session when the session is opened or when the session is reset” .

This is what is returned. The key bit here is the session_ID.

waitsdmv1

So let’s try and put this to use. Please note I use SQLskills queries written by Paul Randal for my wait stats analysis as shown here http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/, probably because that is where I learnt about the topic.

So I run a specific workload such as

DBCC CHECKDB ('AdventureWorks2016')
GO

Then I use the query from the above link which is based on  sys.dm_os_wait_stats  to check wait stats returning the following snippet (I did not clear anything out beforehand):

waitsmain

This tells me about the waits since my last reboot or since a manual reset of the stats. It’s probably why you should do at least time-based analysis or reset the wait stats before starting, that is if you are interested in something time specific or if you want to understand certain workloads at a given time.

So the other option is that you could go down the session level route. With the session based analysis I took the query and changed it slightly to query sys.dm_exec_session_wait_stats and also pull back the session_id that I am interested in.

First thing is that I used the following to get an ID of interest. This was actually the connection for when I ran the CHECKDB command – SPID 56.


SELECT @@SPID AS [Current SPID]

spid56

I then use this ID within the WHERE clause of the query so I could track it to get session level wait stats.

checkwait

As you would expect it is very specific and quite relevant for the workload (CHECKDB) and for that moment in time only. Its definitely not a replacement technique for the other traditional ways but it could be something that you could use if you want to track a session ID real time.

Here is another example, this time tracking session_id 53 which is a big SELECT statement.

select.JPG

Session level waits:

rbar

Using the DMV in its most simplest form would return the following:

SELECT * FROM sys.dm_exec_session_wait_stats
WHERE session_id = 54

simple

 

I guess you could even use sys.dm_exec_session_waits_stats to get some further details when/if you query sys.dm_exec_sessions or anywhere else where you can join onto the session_id, such as:

SELECT
s.program_name,s.login_time,s.database_id, cws.wait_type,cws.wait_time_ms
FROM sys.dm_exec_session_wait_stats AS cws
JOIN sys.dm_exec_sessions AS s
ON cws.session_id = s.session_id
WHERE cws.session_id = 54
ORDER BY wait_time_ms DESC

Snippet:

waits2016_2

Checking DBCC CHECKDB

I hope you run consistency checks on your databases, if you are not currently doing this you probably will want to.

For this post I want to show you how you can check whether or not CHECKDB has been successfully executed.

There are a couple of ways to check this but I will be using DBCC DBINFO for this check.

DBCC DBINFO WITH TABLERESULTS

checkdb

The field of interest here is dbi_dbccLastKnownGood where VALUE of 1900-01-01 00:00:00.000 means that DBCC CHECKDB has never been executed (successfully).

So let’s run CHECKDB and see the value get updated.

DBCC CHECKDB ('ZoraDB')
GO

DBCC DBINFO WITH TABLERESULTS

checkdb2.JPG

The next question I want answering is whether or not this value gets updated if I run CHECKDB with the physical_only option (I created a fresh copy of the database and performed a re-check)

USE [master]
GO

DROP DATABASE [ZoraDB]
GO

CREATE DATABASE [ZoraDB]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'ZoraDB', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ZoraDB.mdf' , SIZE = 5120KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON
( NAME = N'ZoraDB_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ZoraDB_log.ldf' , SIZE = 2048KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

DBCC DBINFO WITH TABLERESULTS

Below confirms that we are back to a newly created database.

checkdb3.JPG

So we now run it with the PHYSICAL_ONLY option.

DBCC CHECKDB ('ZoraDB') WITH PHYSICAL_ONLY
GO
USE [ZoraDB]
GO
DBCC DBINFO WITH TABLERESULTS

So, stating physical_only updates the value too.

physicalonly.JPG

What about separate checks?

Again I dropped and re-created the database to get back to level playing field.

Let’s work through the DBCC check commands, after a command I will check dbi_dbccLastKnownGood to see if it gets updated.

DBCC CHECKALLOC
GO
DBCC DBINFO WITH TABLERESULTS

checkdb4.JPG

DBCC CHECKCATALOG
GO

DBCC DBINFO WITH TABLERESULTS

checkdb4.JPG

DBCC CHECKTABLE ('dbo.people')
GO
DBCC DBINFO WITH TABLERESULTS

checkdb4.JPG

So checking the database via separate checks has no impact on dbi_dbccLastKnownGood.

What about if the database is corrupted? I re-created the database and corrupted it:

DBCC CHECKDB ('ZoraDB')
GO

CHECKDB found 0 allocation errors and 2 consistency errors in database ‘ZoraDB’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (ZoraDB).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC DBINFO WITH TABLERESULTS

checkdb4

Naturally it will not get updated because it was unsuccessful – If we fixed the corruption and ran CHECKDB it would then get updated with the current timestamp.

DO NOT RUN REPAIR WITH THIS OPTION – THIS IS AN EXAMPLE ONLY! It deleted my data!! THIS IS A LAST RESORT – WARNING.

ALTER DATABASE [ZoraDB] SET SINGLE_USER;
GO

DBCC CHECKDB    ('ZoraDB'  , REPAIR_ALLOW_DATA_LOSS )

ALTER DATABASE [ZoraDB] SET MULTI_USER;
GO

DBCC CHECKDB ('ZoraDB')
GO

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘ZoraDB’.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

checkdb6.JPG

There you have it – the updated value.

Don’t blink you might READPAST it

Following on from my previous post on NOLOCK (https://blobeater.blog/2016/12/09/nolock/) I want to talk about another hint called READPAST. This hint tells the database engine not to read rows that are locked by other transactions.

It is best explained with an example.

First let’s look at the basics of locking under the default isolation level with no hints. In my first SSMS (SQL Server Management Studio) window I issue the following.

BEGIN TRAN
UPDATE [HumanResources].[Department]
SET Name = 'Global'
WHERE DepartmentID = 16

Under a second SSMS connection I run a select statement – yes naturally I am blocked.

SELECT * FROM HumanResources.Department

readpast

I have circled in red the resource_description which can be used as input into the undocumented function  %%lockres%% to understand/confirm what is actually being locked.

select *,%%lockres%%
from    HumanResources.Department (nolock)
where    %%lockres%% IN('(62fb4a4c0e9e)')

Just to confirm it is the DepartmentID = 16 row as per my update statement.

readpast1

Anyways, I issue a rollback and the blocking stops.

readpast3

So now that we understand what “normally” happens let’s play with the READPAST hint.

SSMS window 1

 BEGIN TRAN
UPDATE [HumanResources].[Department]
SET Name = 'Global'
WHERE DepartmentID = 16

SSMS window 2

 SELECT * FROM HumanResources.Department (READPAST)

readpst5.JPG

Totally skips the row where DepartmentID =16.

If I rollback the update statement you should see the DepartmentID = 16 come back.

 ROLLBACK

Then

SELECT * FROM HumanResources.Department (READPAST)

reradp2.JPG

So, as the name of the hint suggests, it literally does “readpast” it.