Fun With Set Based Operations

Ok, maybe the title is a little misleading, perhaps fun is not quite the right word, perhaps it’s better to say ‘Something Interesting About Set Based Operations’, but it just doesn’t seem that catchy. Winking smile

You’ve probably worked with variables before, perhaps you have used them in stored procedures or in functions. Perhaps you’ve used them in the occasional ad-hoc query, perhaps in a cursor. Whilst you’ve been working with SET based operators, did you notice that they are not transactional?

Let’s do a little test, first we’ll create a temporary table that we can add a couple of rows into.

SET NOCOUNT OFF
GO

CREATE TABLE #TestTable
(
ATestColumn INT
)
GO

INSERT INTO #TestTable
(ATestColumn)
VALUES(1);

INSERT INTO #TestTable
(ATestColumn)
VALUES(2);

Ok, we execute the script and we see confirmation that 2 rows have been inserted into the table. Now we test if a transaction will affect a SET operation with a modification to our script.

DECLARE @TestVariable INT

SET @TestVariable = 0 --we start our variable at 0.

BEGIN TRANSACTION

INSERT INTO #TestTable --insert a new row into our temp table
(ATestColumn)
VALUES(3)

SELECT @TestVariable = @TestVariable + ATestColumn FROM #TestTable
--Select all rows from our temp table and add them to the variable.

ROLLBACK

SELECT @TestVariable

Select the variable again, the value will still be 6 because although the transaction was rolled back, the variable retains it value.

So, now that you’ve seen it in action, I’ll show you why this can be extremely useful. Open-mouthed smile

We’ll start by creating a new temporary table and insert some dummy data in there. The table we’re creating will represent a line item table, each line item is assigned to an order and each line item is also assigned to a product, with the amount that the product was sold for. To make the table a little more realistic, we’re going to give it a primary key too. The primary key will be on the OrderNumber and LineID columns. This will allow us to add more than 1 of the same product to an order, for example 2 x Tooth brush would be represented as 2 distinct line items on our order detail.

SET NOCOUNT OFF
GO

--Create our test table
CREATE TABLE #TestTable
(
LineID INT IDENTITY(1,1) NOT NULL,
OrderNumber INT NOT NULL
, ProductID INT NOT NULL
, Amount MONEY NOT NULL
, RollingBalance MONEY NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
LineID
, OrderNumber
, ProductID
)
)
GO

--Insert some test data
INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(0,0,500);

INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(0,9,383);

INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(0,3,90);

INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(1,1,500);

INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(2,3,400);

INSERT INTO #TestTable
(
OrderNumber
, ProductID
, Amount
)
VALUES(2,7,1);

Now that we have our little table with some data in it, lets update our RollingBalance column. We’re going to update the column, grouping by OrderNumber to create a rolling balance per order. We could easily change this requirement to group by financial year or something along those lines, but for now, let’s do it by OrderNumber.

If you had been tasked with updating the RollingBalance column, how would you do it? You know that you cannot add 1 row to another in a single update statement, so does that leave a cursor? Or perhaps a sub-query? Perfectly valid answers, except maybe the cursor – you should only use those when you have no other option that perform better.

I’m going to show you how to do the update in 1 statement, with no sub-queries, joins or where clauses. Here is the code: –

DECLARE @OrderNumber INT
DECLARE @MyTestVariable MONEY
SET @MyTestVariable = 0

UPDATE #TestTable
SET
     @MyTestVariable = RollingBalance =
           CASE
                 WHEN @OrderNumber != OrderNumber THEN Amount
                 ELSE @MyTestVariable + Amount
           END
     , @OrderNumber = OrderNumber

So, what do you think? Let’s break it down a little. The first variable @OrderNumber is used to help us determine if we are still on the same OrderNumber or not, the second variable is used to create our rolling balance; and because the variables are not included in a transaction, it is possible to use the variable to create a running total. The only issue that you’re left with is how do you group the totals by OrderNumber or financial year? That’s why we have our second part of the update statement setting the variable @OrderNumber to the row value, so that we can use it to determine if we need to start from 0 again for the next group. The syntax that is used for the RollingBalance column is a little strange, but perfectly valid T-SQL; and it’s not an undocumented feature. Here is the article in msdn http://msdn.microsoft.com/en-us/library/ms177523.aspx.

Let’s look at the execution plan for the statement: –

Execution Plan

A thing of beauty! The update statement went straight down our clustered index statement! There are no where clauses or joins to be concerned about; and for that reason, it is very fast, for example, I used a statement similar to this to calculate a rolling balance on a table of 33 million rows – it completed in 6 seconds. Open-mouthed smile

About Phil

I'm a database administrator with over 10 years experience working in both administration and development using SQL Server 2000 and onwards. I love Terry Prattchet books, movies, music and; of course, my wife Sol - my inspiration and my shelter. "Although all answers are replies, not all replies are answers."
This entry was posted in T-SQL Code Examples and tagged , , . Bookmark the permalink.

Leave a comment