Monday, March 8, 2010

More Fun With SQL – Sub-Selects & Using Temp Tables to Make Sub-Selects Faster

In an earlier post I mentioned that I do a lot of SQL work at my present job. In the course of this work I often have to form complex queries using sub selects in joins. Subs-selects have lots of uses but they come with their own issues.

Consider the following:

SELECT c.custaccount
, c.custaddress
, o.ordid
FROM customers c
JOIN (SELECT
MAX(o.orderid) ordid, o.custaccount
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount) lastorder ON lastorder.custaccount = c. custaccount

This is a pretty standard query, basically we're pulling all customers who had an order between Jan-1-2010 and March-1-2010 and further we're getting the orderid of the customers last order in that time frame. This is a useful query, but isn't the best performing. In cases where the number of records in either table are fairly small there's no real reason to re-write this query.

BUT if the customer table has millions of records with lots of orders per customer this query will take considerable time to run.  what can be done in this case?  Keep in mind that in queries with sub-selects each select through the customer table (in this example) will require the sub-select query to select over the entire order table grouping by the custaccount. This means the query will re-query the orders table over and over again. How can this be done better?

Consider this:

SELECT
MAX(o.orderid) ordid, o.custaccount
INTO #t1
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount

SELECT c.custaccount
, c.custaddress
, o.ordid
FROM customers c
JOIN #t1 t on t.custaccount = c. custaccount

DROP TABLE #t1

This query is much faster by virtue of the fact that we first pull the MAX order id, grouping by the account between '01-01-2010' AND '03-01-2010' into a temporary table, THEN we perform the select. This SQL is much faster because we are not constantly re-querying the same table over and over again. We can clean this up even more by using a CTE. Use of the CTE removes the need to drop the temp table.

WITH maxorder (OrderId, CustomerAccount)
(
SELECT
MAX(o.orderid) ordid, o.custaccount
INTO #t1
FROM Orders o
WHERE o.orderdate BETWEEN
'01-01-2010' AND '03-01-2010'
GROUP BY o.custaccount
)
SELECT c.custaccount
, c.custaddress
, o. OrderId
FROM customers c
JOIN maxorder o on o. CustomerAccount = c. CustomerAccount

Sunday, March 7, 2010

Great Article using IQueryable to Create a Custom Provider Similar to LINQ to SQL For Other DataBase types.

I found this while doing some LINQ to XML work recently for Vanderbilt in preparation for an article on some work I did there.


Its a great article about using the IQueryable interface to do a custom provider similar to LINQ-to-SQL for CRUD operations for other databases.  Check it out, its a good read and comes with examples.

This approach would be useful for crafting a generic repository pattern for your projects.

Slightly Off Topic - Foxit Reader: A Great PDF Viewer

Check out Foxit Reader.  I have just updated my system to Windows 7 Ultimate and have been re-installing updates to my favorite utilities.  I downloaded Adobe PDF reader, but while looking for my other favorites, I found a reference to Foxit.  It runs rings around Adobe speed wise, is much slower and is free to boot.

Check it out here: http://www.foxitsoftware.com/pdf/reader/