Tuesday, December 8, 2009

T-SQL Tuesday #001 – Dates and Times

T-SQL Tuesday was started by Adam Machanic (@AdamMachanic)  on his blog to encourage SQL Bloggers to share their tips and tricks about a specific topic once a month.  A great idea and a great way to get a topic to blog about!
I’ve been working with SQL Server since 1999 and it wasn’t until the last couple of years that I finally learned a better way to retrieve a date range.  I used to do:
SELECT
columns
FROM
TABLE
WHERE
date_column BETWEEN start_date AND end_date;

I’m guessing that there are people reading this that are saying, “I do that all the time and it works fine for me, so what’s the problem?”.  I understand where they are coming from, but I finally really understood that, until the specific DATE data type (date only) in SQL Server 2008, datetime and smalldatetime columns ALWAYS have a time part.  This can and does affect date range queries.  Here’s an example, albeit slightly contrived, but I’ve seen it happen in the real word:

DECLARE @sales TABLE (sale_id INT IDENTITY(1,1) PRIMARY KEY, sale_date DATETIME, sale_amt FLOAT);

WITH cteNums AS 
(
SELECT TOP 50
ROW_NUMBER() OVER (ORDER BY AC.NAME) AS N
FROM
sys.all_columns AS AC 
)
INSERT INTO @sales (
sale_date,
sale_amt
)
SELECT
DATEADD(DAY, -N, GETDATE()) AS sales_date,
N * ABS(CHECKSUM(NEWID()))/10000.00 AS sale_amt 
FROM
cteNums 

/* 
Setup for contrived example.
Make sure the first sale of the month is at midnight 
*/     
UPDATE @sales
SET sale_date = DATEADD(DAY, DATEDIFF(DAY, 0, sale_date), 0)
WHERE
DATEPART(DAY, sale_date) = 1

/*
Task is to get sales for the previous full month
*/
/*
Set up variables for the first and last day of th month
*/
DECLARE @start_date DATETIME,
@end_date DATETIME

/*
Set the variables to the first of last month and the last day of 
last month at the time of writing '2009-11-01' and '2009-11-30' See this blog post by Lynn Pettis for why I am using DATEADD and DATEPART with 0
*/        
SELECT
@start_date = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0),
@end_date = DATEADD(DAY, -1, DATEADD(MONTH, 1, @start_date))

SELECT @start_date, @end_date, DATEADD(DAY, 1, @end_date)

/*
My old method
*/
SELECT
COUNT(*) AS sales,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale,
SUM(sale_amt) AS total_sales 
FROM
@sales
WHERE
sale_date BETWEEN @start_date AND @end_date 

/*
My new method - accurate
*/    
SELECT
COUNT(*) AS sales,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale,
SUM(sale_amt) AS total_sales 
FROM
@sales
WHERE
sale_date >= @start_date AND 
/*First of the next month */
sale_date < DATEADD(DAY, 1, @end_date)

Now I'm sure someone out there will say, “Hey in your last example you are using the first day of the next month and you could do that with between.” Well yes I could and here is the query:

SELECT
COUNT(*) AS sales,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale,
SUM(sale_amt) AS total_sales 
FROM
@sales
WHERE
sale_date BETWEEN @start_date AND DATEADD(DAY, 1, @end_date)

But this does not return the correct results either.  Here are the results for each query:


Count
First Sale
Last Sale
Total
1st Between
29
2009-11-01 00:00:00.000
2009-11-29 11:09:30.107
86362080.49
>= and <
30
2009-11-01 00:00:00.000
2009-11-30 11:09:30.107
86362080.49
2nd Between
31
2009-11-01 00:00:00.000
2009-12-01 00:00:00.000
86362080.49

All that to show that I believe you are better off using >= and < instead of BETWEEN when comparing dates.

4 comments:

  1. Definitely a "wise man". Great post, Jack. This is a concept that some "experts" as well as neophytes simply can't get their head around. Your example which explains the overage you get with BETWEEN explains the most common error people make with dates.

    --Jeff Moden

    ReplyDelete
  2. Thanks, Jeff. I learned the error of my ways in an SSC forum from a post you made. It makes sense when you really think about it.

    ReplyDelete
  3. I usually convert date to remove time-part and then use between... Works just fine.
    MGrape

    ReplyDelete
  4. >>I usually convert date to remove time-part and then use between... Works just fine.

    "It Depends". If you convert the column of data, it's going to be slow and non-SARGABLE. If the data has times in it and you convert the Start And Enddate variables, you could be either missing data you want or including data you didn't want.

    You need to be very careful when using BETWEEN with dates and times. ;-)

    ReplyDelete

So what do you think I am?