Azure SQL DW – Query Labels

Using a query label in Azure SQL DW (Data Warehouse) can be a really handy technique to track queries via DMVs. You might want to do this to see what problematic queries are doing under the covers.

Let’s check out an example. First I will show you how things would look without using a query label. I connect to SQL DW and issue the following basic example query.

SELECT sum(p.StandardCost),ps.[EnglishProductSubcategoryName]
FROM [dbo].[DimProduct] p
JOIN [dbo].[DimProductSubcategory] ps on P.ProductSubcategoryKey = ps.ProductSubcategoryKey
GROUP BY p.standardCost, ps.[EnglishProductSubcategoryName]
ORDER BY ps.[EnglishProductSubcategoryName]

So what query would you use to see what is going on within this MPP environment? You would normally use sys.dm_pdw_exec_requests and/or sys.dm_pdw_request_steps. Something like:

 SELECT r.session_id, r.start_time, r.end_time,rs.location_type, rs.status,rs.command,rs.step_index, rs.operation_type,rs.request_id
FROM sys.dm_pdw_exec_requests r
JOIN sys.dm_pdw_request_steps rs
ON r.request_id = rs.request_id

Output:

whatiswhat
However you probably would not want to manually go through the output to try and piece the output together, this is where labels are handy. The original code now becomes:

SELECT sum(p.StandardCost),ps.[EnglishProductSubcategoryName]
FROM [dbo].[DimProduct] p
JOIN [dbo].[DimProductSubcategory] ps on P.ProductSubcategoryKey = ps.ProductSubcategoryKey
GROUP BY p.standardCost, ps.[EnglishProductSubcategoryName]
ORDER BY ps.[EnglishProductSubcategoryName]
OPTION (LABEL = 'QUERY: MySalesCodeSum');

We then use this LABEL within the DMV based query.

 SELECT r.session_id, r.start_time, r.end_time,rs.location_type, rs.status,rs.command,rs.step_index, rs.operation_type,rs.request_id
FROM sys.dm_pdw_exec_requests r
JOIN sys.dm_pdw_request_steps rs
ON r.request_id = rs.request_id WHERE r.[label] = 'QUERY: MySalesCodeSum'
ORDER BY step_index

Now you will have the output relevant to the original query only.
justmylabel

High level SQL DW is creating temp tables such as:

 CREATE TABLE [tempdb].[dbo].[TEMP_ID_16] ([ProductSubcategoryKey]
INT, [StandardCost] MONEY ) WITH(DATA_COMPRESSION=PAGE);

DMS does the famous ShuffleMoveOperation then the compute node actually ends up running:

 SELECT   [T1_1].[col] AS [col],          
 [T1_1].[EnglishProductSubcategoryName] AS [EnglishProductSubcategoryName] 
  FROM     (SELECT   SUM([T2_2].[StandardCost]) AS [col],                   
    [T2_1].[EnglishProductSubcategoryName] AS [EnglishProductSubcategoryName]        
	    FROM     [tempdb].[dbo].[TEMP_ID_15] AS T2_1                   
		  INNER JOIN                   
		    [tempdb].[dbo].[TEMP_ID_16] AS T2_2                 
			    ON ([T2_1].[ProductSubcategoryKey] = [T2_2].[ProductSubcategoryKey])      
				      GROUP BY [T2_2].[StandardCost], [T2_1].[EnglishProductSubcategoryName]) 
AS T1_1  ORDER BY [T1_1].[EnglishProductSubcategoryName] ASC

Quite cool right?

 

Azure SQL Data Warehouse and Backups

Just because the cloud movement is strong doesn’t mean the end of “DBA’s”, it does mean a change in skills and no doubt you will (one day) create Azure SQL Data Warehouse (DW) in Azure. If you are from an operational background like me then backups will be on your mind for this product. The question is how are backups done with Azure SQL DW?

Continue reading

Azure SQL DW – Let’s Shuffle?

Initially, the main focus of this post was going to be quick and about using the latest version of SSMS (SQL Server Management Studio) to check out execution plans for Azure SQL Data warehouse (DW). I got a little side tracked by a certain operation called – SHUFFLE, because, I like the name.

Continue reading

Azure SQL Data warehouse

For this blog post I want to show you the steps required to setup an Azure SQL Data warehouse and how to connect to it via Power BI to allow for advanced reporting.

Please note that this is not a tutorial on creating reports via Power BI. (I am not the right person for that)

Then we will use TSQL to look at some DMVs (Dynamic Management Views) to gain further insight as to what it happening under the covers.

Creating  Azue SQL DWH

So, login to your Azure account and under “Data + Storage” navigate to SQL Data Warehouse as shown below.

dwh1

dwh2

Just work your way through the setup wizard.

dwh3

dwh4

So once deployment has finished we will be going to PowerBI to connect to the newly created SQL DWH which will become our data source.

This can be found using the following link : https://powerbi.microsoft.com/en-us/. Once you have signed in you will be presented with a welcome screen.

dwh5

The connection section is straightforward, all you will need is your server , database name, username and password. You will then be authenticated and taken to the main interface.

dwh7

Again, I will stress the point here that this is not a lesson on BI and analytics but to show you setup and configuration of Azure SQL DWH. However, after many attempts I did manage to build a report that shows the geographical location of my customer base.

dwh8

Now to the interesting section, using TSQL to look under the covers.

TSQL

Using the more recent version of SSMS connect to the logical server. Then issue the following command.

You can download the latest version from https://msdn.microsoft.com/en-us/library/mt238290.aspx. Last updated September 20 2016., release number 13.0.15800.18.

SELECT * FROM  sys.dm_pdw_nodes

dwh9

Here you will see 1 control node and multiple compute nodes. The control node is the brains whereas the compute nodes run the commands given to them from the Control node. Communication between these nodes are done by DMS (Data Management Service)

SELECT * FROM sys.dm_pdw_exec_requests
WHERE resource_class ='smallrc'
ORDER BY start_time

If you look at the command column you will see the activity that had occurred, the Initial phase being the creation of all the tables.

dwh10

This DMV also holds the query text that occurs for when creating the actual reports. For example at 2016-06-15 07:59:47.247 – row 41 we basically issued:

dwh11

SELECT
(COUNT_BIG(DISTINCT t8.BirthDate) + MAX(CASE WHEN t8.BirthDate IS NULL THEN 1 ELSE 0 END))
AS a0
FROM
(
(SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,LastName,NameStyle,Cast(BirthDate as DateTime)
as BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation,EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,AddressLine1,AddressLine2,Phone,Cast(DateFirstPurchase as DateTime) as DateFirstPurchase,CommuteDistance
From dbo.DimCustomer)
)
AS t8

Now let’s say you want more insight into the actual query plan- You would use the EXPLAIN Command.

EXPLAIN SELECT
(COUNT_BIG(DISTINCT t8.BirthDate) + MAX(CASE WHEN t8.BirthDate IS NULL THEN 1 ELSE 0 END))
AS a0
FROM
(
(SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,LastName,NameStyle,Cast(BirthDate as DateTime)
as BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,TotalChildren,NumberChildrenAtHome, EnglishEducation,SpanishEducation,FrenchEducation,EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,AddressLine1,AddressLine2,
Phone,Cast(DateFirstPurchase as DateTime) as DateFirstPurchase,CommuteDistance
From dbo.DimCustomer)
)
AS t8

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.