I have come across a situation a number of times in the past that seems to be one of those things that are so obvious when you see the solution, but can’t see them before the penny has dropped.
Imagine the following scenario:
You want to find the total number of orders that have the Order Status ‘A’ and the number of orders with an Order Status of ‘B’. This sounds like a simple enough request, that I’m sure you have heard of before.
Lets start off with some test data.
--Test Structure USE master go IF DB_ID('Sandbox') IS NULL BEGIN CREATE DATABASE Sandbox END GO USE Sandbox GO IF OBJECT_ID('dbo.CountExample') IS NOT NULL BEGIN DROP TABLE dbo.CountExample END GO IF OBJECT_ID('dbo.Nums') IS NOT NULL BEGIN DROP FUNCTION dbo.Nums END GO -- Test Function to allow fast test data creation CREATE FUNCTION [dbo].[Nums] (@m AS bigint) RETURNS TABLE AS RETURN WITH t0 AS (SELECT n = 1 UNION ALL SELECT n = 1), t1 AS (SELECT n = 1 FROM t0 AS a, t0 AS b), t2 AS (SELECT n = 1 FROM t1 AS a, t1 AS b), t3 AS (SELECT n = 1 FROM t2 AS a, t2 AS b), t4 AS (SELECT n = 1 FROM t3 AS a, t3 AS b), t5 AS (SELECT n = 1 FROM t4 AS a, t4 AS b), results AS (SELECT ROW_NUMBER() OVER (ORDER BY n) AS n FROM t5) SELECT n FROM results WHERE n <= @m GO CREATE TABLE dbo.CountExample (OrderId int NOT NULL, OrderStatus char(1) NOT NULL) GO --Test data INSERT INTO dbo.CountExample (OrderId, OrderStatus) SELECT n, CHAR(n % 27 + 64) FROM dbo.Nums (1000) AS N GO
Now that we have some test data and tables, we can take a look at what solutions are possible.
Solution 1:
The solution that I have seen come from a lot of people has been to basically run two queries, one for each Order Stautus and then collect these together returning the result.
Something along the lines of:
SELECT (SELECT COUNT(*) CountA FROM dbo.CountExample AS CE WHERE OrderStatus = 'A') CountA, (SELECT COUNT(*) CountB FROM dbo.CountExample AS CE WHERE OrderStatus = 'B') CountB
This delivers the correct result, but causes two separate queries to be run (one for each Order Status). There are variations of this solution, using sub-queries or CTEs, but I hope you get the idea that a separate COUNT() is required for each total that you want to calculate.
Solution 2:
The best way, that I know of, to achieve this would be to change the logic from a COUNT() to a SUM(). This sounds wrong at first, especially because the column Order Status is a char(1) and not an integer!
Take a look at how I have solved the problem with SUM():
SELECT SUM(CASE WHEN OrderStatus = 'A' THEN 1 ELSE 0 END) CountA, SUM(CASE WHEN OrderStatus = 'B' THEN 1 ELSE 0 END) CountB FROM dbo.CountExample AS CE
Looking at the code, we can see that I have not just used SUM(), but also a CASE statement. CASE is one of my favourite constructs in T-SQL, as it allows you to perform logical processing of an entire set or only part of a set without filtering using a WHERE clause.
If you take a look at the execution plan, you will also see that the table is accessed once. This is an instant improvement over the “standard” solution of COUNT()-ing per Order Status and has the added bonus of never being noticeably more expensive, regardless of how many different Order Status totals are required.
So there you go. COUNT() isn’t always the best way to count data in SQL Server.