Monday, March 8, 2010

More Fun With SQL – Sub-Selects & Using Temp Tables to Make Sub-Selects Faster

In an earlier post I mentioned that I do a lot of SQL work at my present job. In the course of this work I often have to form complex queries using sub selects in joins. Subs-selects have lots of uses but they come with their own issues.

Consider the following:

SELECT c.custaccount
, c.custaddress
, o.ordid
FROM customers c
JOIN (SELECT
MAX(o.orderid) ordid, o.custaccount
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount) lastorder ON lastorder.custaccount = c. custaccount

This is a pretty standard query, basically we're pulling all customers who had an order between Jan-1-2010 and March-1-2010 and further we're getting the orderid of the customers last order in that time frame. This is a useful query, but isn't the best performing. In cases where the number of records in either table are fairly small there's no real reason to re-write this query.

BUT if the customer table has millions of records with lots of orders per customer this query will take considerable time to run.  what can be done in this case?  Keep in mind that in queries with sub-selects each select through the customer table (in this example) will require the sub-select query to select over the entire order table grouping by the custaccount. This means the query will re-query the orders table over and over again. How can this be done better?

Consider this:

SELECT
MAX(o.orderid) ordid, o.custaccount
INTO #t1
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount

SELECT c.custaccount
, c.custaddress
, o.ordid
FROM customers c
JOIN #t1 t on t.custaccount = c. custaccount

DROP TABLE #t1

This query is much faster by virtue of the fact that we first pull the MAX order id, grouping by the account between '01-01-2010' AND '03-01-2010' into a temporary table, THEN we perform the select. This SQL is much faster because we are not constantly re-querying the same table over and over again. We can clean this up even more by using a CTE. Use of the CTE removes the need to drop the temp table.

WITH maxorder (OrderId, CustomerAccount)
(
SELECT
MAX(o.orderid) ordid, o.custaccount
INTO #t1
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount
)
SELECT c.custaccount
, c.custaddress
, o. OrderId
FROM customers c
JOIN maxorder o on o. CustomerAccount = c. CustomerAccount

No comments:

Post a Comment