Always Encrypted Column Encryption

I have a need to encrypt a column within my SQL Database (Azure). I decided to use Always Encrypted. This feature essentially uses a column encryption key that is used to encrypt data in an encrypted column and a column master key  that encrypts one or more column encryption keys.

It is paramout that you install an Always Encrypted-enabled driver on the client as the driver encrypts the data in sensitive columns before passing the data to the database engine.

Your application connection string will be something similar to:

string connectionString = "Data Source=yourserver; Initial Catalog=mydb;
Integrated Security=true; Column Encryption Setting=enabled";
SqlConnection connection = new SqlConnection(connectionString);

Doing this is important because if you have the setting disabled and you have a query with parameters targeting an encrypted column it will fail. The following table from Books on Line summaries this (last column).

ae1

There is a lot to digest for this feature it’s not just the database you need to consider; a picture speaks a thousand words so have a look how everything sits together.

ae2

Setup via the wizard

Navigate to your column of interest and click the Encrypt Columns option

ae3

The wizard arrives

ae4

Next is column selection – This will be my salary column (ok it may not be a great example but it’s the idea I want to convey)

Always Encrypted uses AEAD_AES_256_CBC_HMAC_SHA_256 algorithm. There are 2 variations available, this being deterministic and randomized.

ae5

Deterministic – This is the weaker option of the 2. It uses a method which always generates the same encrypted value for any given plain text value. It allows for grouping, filtering by equality, and joining tables based on encrypted values. Be aware that Deterministic encryption must use a column collation with a binary2 sort order for character columns.

Randomised – Less predictable than the above, it does not allow for grouping and filtering on the encrypted column.

You have to make a choice here; both options have pros and cons.  If you know that you need to do grouping, indexing or joins on your encrypted column then you don’t really have a choice and will have to use deterministic.

For my example I am going with randomised. If you try and group/filter on a column with randomised you will get:

Msg 33299, Level 16, State 2, Line 8 – Encryption scheme mismatch for columns/variables ‘salary’. The encryption scheme for the columns/variables is (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = yourdb) and the expression near line ‘8’ expects it to be (encryption_type = ‘DETERMINISTIC’) (or weaker).

Next is master key configuration. Ideally I would love to setup the Azure Key Vault provider but I think that would make a great blog post (one day), so here I opted for Windows certificate store.

ae6

Save the PowerShell script if you desire.

ae7

ae8

Issue your query now:

ae9

As mentioned before this is only 1 piece of the puzzle, you need to make application changes to handle this encryption. I suggest reading through https://msdn.microsoft.com/en-us/library/mt757097.aspx as this shows you how to build your queries (the documentation is in .NET) the last thing you want is to see something like –

System.Data.SqlClient.SqlException (0x80131904): Operand type clash: varchar is incompatible with varchar(8000) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘mydb’) collation_name = ‘SQL_Latin1_General_CP1_CI_AS’

I also suggest reading the Feature detail section found here https://msdn.microsoft.com/en-us/library/mt163865.aspx

SQL Server Backup Compression

I thought that it would be handy to capture some Perfmon graphs of what happens to backup throughput and CPU on my SQL Server when I have backup compression on vs off against my 35GB database.

No compression

BACKUP DATABASE TATA TO DISK = 'd:\mylocation\DBNOTOCOMP.BAK' WITH COPY_ONLY,
STATS = 10

I will be using data collectors to show the perfmon counters(below) for the duration of the backup command – it just paints a better picture than running real time perfmon graphs.

% Processor Time

perf

Backup Throughput/sec

perf1

Overall Time & size

5.5 minutes to do:

perf3

With Compression

BACKUP DATABASE TATA TO DISK = ' d:\mylocation\DByesCOMP.BAK'
WITH COPY_ONLY, STATS = 10, COMPRESSION

Again Data collectors paint a nice picture for CPU and Backup Throughput.

% Processor Time

perf5

Backup Throughput/sec

perf6

Overall Time & size

2.5 minutes to do:

perf7

Findings

Some of the findings maybe obvious to you, having said that here is what I found:

  • Using backup compression uses more CPU. (My Average for %Processor Time went from 5.05 to 18.93)
  • A compressed backup is much smaller hence backup speed is much faster (nearly reduced the time taken by 50%).
  • Backup Throughput more than doubled when compressing.
  • Compression ratio does depend on many factors but my ratio was 2.88.

Trace Flag 3042

You can change the behaviour of the compressed backup. By default the Database Engine uses a pre-allocation algorithm for the backup file. For my example it pre-defined to the size of DByesCOMP.BAK = 4646286KB.

Using the trace flag you can see the change in behaviour. Instead it starts of from a low size and works its way up to the true final size.

DBCC TRACEON (3042)
BACKUP DATABASE TATA TO DISK = ' d:\mylocation\DByesCOMP1.BAK'
WITH COPY_ONLY, STATS = 10, COMPRESSION

perf8.JPG

I did find that using this trace flag made the backup operation slightly longer with a reduced throughput rate. I personally have no use for this trace flag but it was a new behaviour for me so I thought I would share it too.

perf10

SSMS 2016 Bug

I have been using SSMS (SQL Server Management Studio) 2016 build number 13.0.15900.1 and noticed that if you wanted to manage compression you couldn’t as it is greyed out.

Microsoft SQL Server Management Studio 13.0.15900.1

ssms

The actual message that you are presented states: Tables with sparse columns cannot be compressed – I have no sparse columns.

I never had this issue with SQL Server 2014 Management Studio:

Microsoft SQL Server Management Studio           12.0.4100.1

ssms1

The next release of SSMS is now available build number 13.0.16000.28 – see link https://msdn.microsoft.com/en-us/library/mt238290.aspx

I am happy because it addresses this issue, item number 4 from the above link states: Fixed the issue that “Manage Compression” menu item is disabled for the user table nodes in object explorer tree.

 

 

More bytes for Row versioning

I like row versioning– see this link for more details: https://technet.microsoft.com/en-us/library/ms189122(v=sql.105).aspx

If your database is enabled for one of the isolation levels that uses row versioning and a row is updated it will have 14 bytes added to it.

Before anything is enabled:

SELECT name,is_read_committed_snapshot_on, snapshot_isolation_state_desc,
snapshot_isolation_state
 FROM sys.databases WHERE database_id = 18

iq1

Let’s check a table before enabling anything.

select
index_id, index_type_desc,alloc_unit_type_desc,index_depth,max_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorks2018'),
object_id('AWBuildVersion'), null, null, 'DETAILED')

dfsf

Switch it on:

ALTER DATABASE [AdventureWorks2018]
SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE;
GO

ALTER DATABASE [AdventureWorks2018]
SET ALLOW_SNAPSHOT_ISOLATION ON 
GO

SELECT name,is_read_committed_snapshot_on, snapshot_isolation_state_desc,
snapshot_isolation_state
 FROM sys.databases WHERE database_id = 18

jgjkgtjkgkgk

Let’s do some work (UPDATE STATEMENT)

UPDATE [dbo].[AWBuildVersion]
SET ModifiedDate = GETDATE()
GO

select
index_id, index_type_desc,alloc_unit_type_desc,index_depth,max_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorks2018'),
object_id('AWBuildVersion'), null, null, 'DETAILED')

hjjjjj

56 +14 = 70, meaning we have a 14 byte overhead. I did some research on this and I ended up going to my book self and dug out the SQL Server 2012 Internals text book, the 14 byte overhead is needed for “the actual pointer to the file, page, and row in tempdb which needs 8 bytes and 6 bytes are needed to store the XSN” (Chapter 13. Page 823)

What happens if I disable row versioning and run an update?

ALTER DATABASE [AdventureWorks2018]
SET READ_COMMITTED_SNAPSHOT OFF
GO

ALTER DATABASE [AdventureWorks2018]
SET ALLOW_SNAPSHOT_ISOLATION OFF
GO

UPDATE [dbo].[AWBuildVersion]
SET ModifiedDate = '2009-10-13 00:00:00.000'
GO


select
index_id, index_type_desc,alloc_unit_type_desc,index_depth,max_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorks2018'),
object_id('AWBuildVersion'), null, null, 'DETAILED')

There is no longer the 14 byte overhead – due to the fact that row versioning is no longer on.

lllll

The dark side of Fn_dump_dblog

I want to show you what happens when you keep using the undocumented fn_dump_dblog function. (For example if you are consistently dumping the contents into a table for analysis)

The below 2 queries shows the current state of my system

SELECT * FROM sys.dm_os_schedulers

SELECT COUNT(*) FROM sys.dm_os_threads

dark.JPG

Now let’s just set the environment up and use fn_dump_dblog and see the effect of using it.

ALTER DATABASE [AdventureWorks2012DR] SET RECOVERY FULL

BACKUP DATABASE [AdventureWorks2012DR] TO DISK = 'C:\SQLSERVER\AdventureWorks2012DR.BAK'

GO
BACKUP LOG [AdventureWorks2012DR] TO DISK = 'C:\SQLSERVER\AdventureWorks2012DRLog.BAK'

GO

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Description]
FROM
    fn_dump_dblog (
        NULL, NULL, N'DISK', 1, N'C:\SQLSERVER\AdventureWorks2012DRLog.BAK',
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)

Ok so it returns some information – Great.

dark2

Let’s keep running the code (Movement into a table for future analysis). Look at the state of my system. After couple of hours:

dark3

Using this command creates more threads and hidden schedulers (these will only go after a restart). Depending on what version of SQL Server you are on and what Service Pack you may or may not have this issue. It was fixed in SQL 2012 SP2 onwards. So be on the cautious side when running these sorts of commands.

Also I noticed Memory bloat for the sqlservr.exe. Nothing else was running on this server, just my fn_dump_dblog script.  Threads need memory too. (Each thread will use 0.5MB on x86 and 2MB on x64)

capturesdd