Wednesday, February 24, 2010

Three T-SQL Tricks Courtesy of Some Friends

If you do any business development at all odds are you have to create, read, update and delete data (CRUD) out of a database of some type.

At my present job we write a lot of T-SQL. In my two years there I've been lucky enough to learn some nice tricks from some really talented SQL coders who have graciously shared their SQL insights with me. In this post I'd like to cover three simple tricks to keep in mind when writing SQL queries.

I) Queries with both inner and outer joins

Consider:

SELECT * FROM tblCustomer c
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

This is a pretty standard query when you want to find all (in this case) customers in Ohio without phone numbers. This query will work, but its not very well written. Consider the order of the joins - in this query we have two joins - an outer and an inner one with the outer join first in the sequence. What will happen is the following.

So say tblCustomer has 1000 customers, tblPhones has 2000 phone records, we end up pulling a lot of data up front before we start limiting the size of the dataset. Its not until we inner join to tblAddress, with 300 addresses that we start trimming down the data. The criteria of the where clause further limits the dataset to the 50 addresses in Ohio and then to the 10 customers in Ohio don't have a phone.

Consider this version:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

So now tblCustomer still has 1000 customers, but the inner join with tblAddress now immediately restricts the data set to 300, pulling a lot less data up front.  The outer join still pulls 2000 phones and the where clause will then further restrict the data set to 10. As you can see this query is much better because we pull less data up front. By moving all inner joins above any outer joins will greatly decrease query time and make the query much more efficient.

II) Move Your Criteria Out of the WHERE clause.

Consider again:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

This is better than where we started from, but we can do better. Note that in the where condition we have the following: AND a.State = 'OH'. This criteria targets the tblAddress table, but does it have to be in the WHERE statement?

Consider now:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID AND a.State = 'OH'
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL

By moving the State criteria to the join the following happens. tblCustomer still has 1000 customers, but now the inner join and the State = 'OH' criteria with tblAddress now restricts the data set to 10 instead of 300. Again this query is much better because we pull much less data up front. By moving what criteria you can out of the where clause you will greatly decrease query time.

III) Do you really need to SELECT * from those tables?

Many developers will write SELECT * on select queries when they really need only a few fields from the tables they are querying. If you limit SELECTs to just that data you want to retrieve, you can increase your query's performance, particularly when you query tables whit a lot of columns.

So to recap: 
-Inner Joins before Outer Joins.
-Move as much of your queries criteria to the joins as possible.
-No SELECT * if you can help it.

Simple tricks, but applying them can make a big difference.

My thanks to my friends Alan Huffman and John Anderson who have been unstinting with SQL knowledge.  Check out Alan's Blog Southern BITS of Technical Goodness.

No comments:

Post a Comment