Azure SQL Data warehouse

For this blog post I want to show you the steps required to setup an Azure SQL Data warehouse and how to connect to it via Power BI to allow for advanced reporting.

Please note that this is not a tutorial on creating reports via Power BI. (I am not the right person for that)

Then we will use TSQL to look at some DMVs (Dynamic Management Views) to gain further insight as to what it happening under the covers.

Creating  Azue SQL DWH

So, login to your Azure account and under “Data + Storage” navigate to SQL Data Warehouse as shown below.

dwh1

dwh2

Just work your way through the setup wizard.

dwh3

dwh4

So once deployment has finished we will be going to PowerBI to connect to the newly created SQL DWH which will become our data source.

This can be found using the following link : https://powerbi.microsoft.com/en-us/. Once you have signed in you will be presented with a welcome screen.

dwh5

The connection section is straightforward, all you will need is your server , database name, username and password. You will then be authenticated and taken to the main interface.

dwh7

Again, I will stress the point here that this is not a lesson on BI and analytics but to show you setup and configuration of Azure SQL DWH. However, after many attempts I did manage to build a report that shows the geographical location of my customer base.

dwh8

Now to the interesting section, using TSQL to look under the covers.

TSQL

Using the more recent version of SSMS connect to the logical server. Then issue the following command.

You can download the latest version from https://msdn.microsoft.com/en-us/library/mt238290.aspx. Last updated September 20 2016., release number 13.0.15800.18.

SELECT * FROM  sys.dm_pdw_nodes

dwh9

Here you will see 1 control node and multiple compute nodes. The control node is the brains whereas the compute nodes run the commands given to them from the Control node. Communication between these nodes are done by DMS (Data Management Service)

SELECT * FROM sys.dm_pdw_exec_requests
WHERE resource_class ='smallrc'
ORDER BY start_time

If you look at the command column you will see the activity that had occurred, the Initial phase being the creation of all the tables.

dwh10

This DMV also holds the query text that occurs for when creating the actual reports. For example at 2016-06-15 07:59:47.247 – row 41 we basically issued:

dwh11

SELECT
(COUNT_BIG(DISTINCT t8.BirthDate) + MAX(CASE WHEN t8.BirthDate IS NULL THEN 1 ELSE 0 END))
AS a0
FROM
(
(SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,LastName,NameStyle,Cast(BirthDate as DateTime)
as BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation,EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,AddressLine1,AddressLine2,Phone,Cast(DateFirstPurchase as DateTime) as DateFirstPurchase,CommuteDistance
From dbo.DimCustomer)
)
AS t8

Now let’s say you want more insight into the actual query plan- You would use the EXPLAIN Command.

EXPLAIN SELECT
(COUNT_BIG(DISTINCT t8.BirthDate) + MAX(CASE WHEN t8.BirthDate IS NULL THEN 1 ELSE 0 END))
AS a0
FROM
(
(SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,LastName,NameStyle,Cast(BirthDate as DateTime)
as BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,TotalChildren,NumberChildrenAtHome, EnglishEducation,SpanishEducation,FrenchEducation,EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,AddressLine1,AddressLine2,
Phone,Cast(DateFirstPurchase as DateTime) as DateFirstPurchase,CommuteDistance
From dbo.DimCustomer)
)
AS t8

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.