Tuesday, November 25, 2008

Advanced Null Handling using the GROUP BY operator

As you may know,

select count(criteria) from junk
where criteria is null


and

select count(criteria) from junk
where criteria is not null


will always give you the same number. 10, in this case, irrespective of how many null values there exist in Column Criteria.

Therefore the problem would become clear if we needed to find the ratio of the number of null values for the following queries grouped by column 'FY' as below.

SELECT distinct count(case when decal is null then 0 else 1 end) as nulldecal,
Case
When [Date of Violation] Between '7/1/2005' AND '6/30/2006 23:59:59' then 'FY06'
When [Date of Violation]Between '7/1/2006' AND '6/30/2007 23:59:59' then 'FY07'
When [Date of Violation] Between '7/1/2007' AND '6/30/2008 23:59:59' then 'FY08'
else 'nulldate'
end as FY
from [ECB Violations]
where decal is null
group by
Case
When [Date of Violation] Between '7/1/2005' AND '6/30/2006 23:59:59' then 'FY06'
When [Date of Violation]Between '7/1/2006' AND '6/30/2007 23:59:59' then 'FY07'
When [Date of Violation] Between '7/1/2007' AND '6/30/2008 23:59:59' then 'FY08'
else 'nulldate'
end

Concatenating the Results of a Grouped SQL Query



Goal: To concatenate, associated columns of grouped rows, as shown below.

SQL Queries allow for the GROUP BY operator, in instances where you group tuples (rows in a SQL result set) for some aggregate function, more commonly appropriate for numeric values like SUM, AVERAGE etcetera. In the case of String, or Character values, the aggregate functions are limited in scope.

Solving the above problem would require such a function, similar in functionality to a say, CONCAT (criteria), but none such exists. The work around involves the creation of a temporary table, and a dbo function with an output variable. A recursive solution would be more elegant, and any contributions would be appreciated.

Solution

1) Create a temporary Table called Temp as following:

CREATE TABLE [dbo].[junk]( [ID] [varchar](50) , [Column_1] [varchar](50))

Insert into this the first two columns, the first column being the column that is grouped by and the second, the one that needs to be aggregated of the query in question

2) Define a new Scalar Function in MS Sql Server as follow:

CREATE FUNCTION[dbo].[StringAggregate](@ID varchar(10))
returns varchar(5000)
as
begin
declare @out varchar(5000)
select @out = coalesce(@out + ' , ' + convert(varchar,Column_1), convert(varchar,Column_1))
from junk
where ID = @ID
return @out
end

3) Invoke the function as an Aggregate in the following: (Note the use of the output variable Criteria)


select ID, dbo.StringAggregate(ID) Criteria
from
(
select ID
from junk
group by ID
) a