Does it matter if you put your criteria in the ON clause or the WHERE clause? Well, as with most things SQL the answer is, “It depends”. If you are dealing with INNER JOIN’s then it really doesn’t matter because the query optimizer is smart enough to come up with the same execution plan for both. For example these 2 queries evaluate to the same execution plan:
SELECT
SOD.SalesOrderDetailID,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH JOIN
Sales.SalesOrderDetail AS SOD ON
SOH.SalesOrderID = SOD.SalesOrderID
WHERE
SOH.OrderDate >= '7/1/2004' AND
SOH.OrderDate < '8/1/2004'
SELECT
SOD.SalesOrderDetailID,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH,
Sales.SalesOrderDetail AS SOD
WHERE
SOH.SalesOrderID = SOD.SalesOrderID AND
SOH.OrderDate >= '7/1/2004' AND
SOH.OrderDate < '8/1/2004'
The old SQL 6.5 OUTER JOIN syntax (*= and =*) has been discontinued beginning with SQL Server 2005, so you have to do the JOIN for OUTER JOIN’s in the ON as demonstrated in this code:
SELECT
SOD.SalesOrderDetailID,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH LEFT JOIN
Sales.SalesOrderDetail AS SOD
ON SOH.SalesOrderID = SOD.SalesOrderID
WHERE
SOH.OrderDate >='7/1/2004' AND
SOH.OrderDate <'8/1/2004'
Now let’s create a sandbox to play in.
If OBJECT_ID('sales.orders', 'U') Is Not Null
Begin
Drop Table sales.orders;
End;
If OBJECT_ID('sales.order_items', 'U') Is Not Null
Begin
Drop Table sales.order_items;
End;
If SCHEMA_ID('sales') Is Not Null
Begin
Drop Schema sales;
End;
Go
Create Schema sales;
Go
/*
Tables to hold sample data
*/
Create Table sales.orders
(
order_id INT IDENTITY(1,1)PRIMARY KEY,
customer_id INT
);
Create Table sales.order_items
(
order_detail_id INT IDENTITY(1, 1)PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT
)
/*
Load Sample data
*/
INSERT INTO sales.orders (customer_id)
SELECT TOP 5
AO.[object_id]
FROM
sys.all_objects AS AO;
INSERT INTO sales.order_items
(
order_id,
product_id,
quantity
)
SELECT
1,
1,
7
Union ALL
Select
2,
1,
4
Union ALL
Select
3,
2,
6
Union ALL
Select
4,
2,
11
Union ALL
Select
5,
3,
1;
Now we want to return all the customers who have placed an order, but we only want to return the items where the quantity is greater than 5. Here is method 1:
Select
O.customer_id,
OI.order_id,
OI.product_id,
OI.quantity
From
sales.orders AS O LEFT OUTER JOIN
sales.order_items AS OI ON
O.order_id = OI.order_id
Where
OI.quantity > 5;
customer_id order_id product_id quantity
----------- ----------- ----------- -----------
3 1 1 7
7 3 2 6
8 4 2 11
Hmmm, we know we have orders from five customers, but this only returns the three rows. Let’s look at the execution plan:
What’s that nest loops (inner join) operator? Well, by putting the criteria for the RIGHT (second) table in the WHERE clause we essentially converted our LEFT OUTER JOIN to an INNER JOIN. The correct way to get the data we want would be this way:
SELECT
O.customer_id,
OI.order_id,
OI.product_id,
OI.quantity
FROM
sales.orders AS O LEFT OUTER JOIN
sales.order_items AS OI ON
O.order_id = OI.order_id AND
OI.quantity > 5;
This returns what we would expect to see:
customer_id order_id product_id quantity
----------- ----------- ----------- -----------
3 1 1 7
5 NULL NULL NULL
7 3 2 6
8 4 2 11
17 NULL NULL NULL
And here is the execution plan:
Where you can see the Nested Loops (left outer join) operator.
So yes, it does matter where you put your criteria when dealing with OUTER JOINS.
No comments:
Post a Comment
So what do you think I am?