Wednesday, September 30, 2009

On vs. Where

Does it matter if you put your criteria in the ON clause or the WHERE clause?  Well, as with most things SQL the answer is, “It depends”.  If you are dealing with INNER JOIN’s then it really doesn’t matter because the query optimizer is smart enough to come up with the same execution plan for both.  For example these 2 queries evaluate to the same execution plan:

SELECT
SOD.SalesOrderDetailID
,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH
JOIN
Sales.SalesOrderDetail AS SOD
ON
SOH.SalesOrderID = SOD.SalesOrderID
WHERE
SOH.OrderDate >= '7/1/2004'
AND
SOH.OrderDate <
'8/1/2004'

SELECT
SOD.SalesOrderDetailID
,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH
,
Sales.SalesOrderDetail AS SOD
WHERE
SOH.SalesOrderID = SOD.SalesOrderID
AND
SOH.OrderDate >= '7/1/2004'
AND
SOH.OrderDate < '8/1/2004'



Execution Plan



The old SQL 6.5 OUTER JOIN syntax (*= and =*) has been discontinued beginning with SQL Server 2005, so you have to do the JOIN for OUTER JOIN’s in the ON as demonstrated in this code:



SELECT

  
SOD.SalesOrderDetailID
,

  
SOH.SalesOrderID


FROM

  
Sales.SalesOrderHeader AS SOH
LEFT JOIN

  
Sales.SalesOrderDetail AS SOD


        ON SOH.SalesOrderID = SOD.SalesOrderID


WHERE

  
SOH.OrderDate >='7/1/2004'
AND

  
SOH.OrderDate <'8/1/2004'





Now let’s create a sandbox to play in.



If OBJECT_ID('sales.orders', 'U') Is Not Null
Begin
Drop Table
sales.orders;
End;

If OBJECT_ID('sales.order_items', 'U') Is Not Null
Begin
Drop Table
sales.order_items;
End;

If SCHEMA_ID('sales') Is Not Null
Begin
Drop Schema
sales;
End;

Go

Create Schema
sales;

Go
/*
Tables to hold sample data
*/
Create Table sales.orders
(
order_id INT IDENTITY(1,1)PRIMARY KEY,
customer_id INT
);

Create Table sales.order_items
(
order_detail_id INT IDENTITY(1, 1)PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT
)

/*
Load Sample data
*/
INSERT INTO sales.orders (customer_id)
SELECT TOP 5
AO.[object_id]
FROM
sys.all_objects AS AO;

INSERT INTO sales.order_items
(
order_id,
product_id,
quantity
)
SELECT
1,
1,
7
Union ALL
Select
2,
1,
4
Union ALL
Select
3,
2,
6
Union ALL
Select
4,
2,
11
Union ALL
Select
5,
3,
1;



Now we want to return all the customers who have placed an order, but we only want to return the items where the quantity is greater than 5.  Here is method 1:


Select
O.customer_id,
OI.order_id,
OI.product_id,
OI.quantity
From
sales.orders AS O LEFT OUTER JOIN
sales.order_items AS OI ON
O.order_id = OI.order_id
Where
OI.quantity > 5;



This returns:



customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3 1 1 7
7 3 2 6
8 4 2 11


Hmmm, we know we have orders from five customers, but this only returns the three rows.  Let’s look at the execution plan:



image



What’s that nest loops (inner join) operator?  Well, by putting the criteria for the RIGHT (second) table in the WHERE clause we essentially converted our LEFT OUTER JOIN to an INNER JOIN.  The correct way to get the data we want would be this way:



SELECT

   
O.customer_id,

   
OI.order_id,

   
OI.product_id,

   
OI.quantity

FROM

   
sales.orders AS O LEFT OUTER JOIN

   
sales.order_items AS OI ON

       
O.order_id = OI.order_id AND

       
OI.quantity > 5;



This returns what we would expect to see:





customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3 1 1 7
5 NULL NULL NULL
7 3 2 6
8 4 2 11
17 NULL NULL NULL



And here is the execution plan:




image



Where you can see the Nested Loops (left outer join) operator.




So yes, it does matter where you put your criteria when dealing with OUTER JOINS.

No comments:

Post a Comment

So what do you think I am?