Wednesday, February 24, 2010

Three T-SQL Tricks Courtesy of Some Friends

If you do any business development at all odds are you have to create, read, update and delete data (CRUD) out of a database of some type.

At my present job we write a lot of T-SQL. In my two years there I've been lucky enough to learn some nice tricks from some really talented SQL coders who have graciously shared their SQL insights with me. In this post I'd like to cover three simple tricks to keep in mind when writing SQL queries.

I) Queries with both inner and outer joins

Consider:

SELECT * FROM tblCustomer c
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

This is a pretty standard query when you want to find all (in this case) customers in Ohio without phone numbers. This query will work, but its not very well written. Consider the order of the joins - in this query we have two joins - an outer and an inner one with the outer join first in the sequence. What will happen is the following.

So say tblCustomer has 1000 customers, tblPhones has 2000 phone records, we end up pulling a lot of data up front before we start limiting the size of the dataset. Its not until we inner join to tblAddress, with 300 addresses that we start trimming down the data. The criteria of the where clause further limits the dataset to the 50 addresses in Ohio and then to the 10 customers in Ohio don't have a phone.

Consider this version:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

So now tblCustomer still has 1000 customers, but the inner join with tblAddress now immediately restricts the data set to 300, pulling a lot less data up front.  The outer join still pulls 2000 phones and the where clause will then further restrict the data set to 10. As you can see this query is much better because we pull less data up front. By moving all inner joins above any outer joins will greatly decrease query time and make the query much more efficient.

II) Move Your Criteria Out of the WHERE clause.

Consider again:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL AND a.State = 'OH'

This is better than where we started from, but we can do better. Note that in the where condition we have the following: AND a.State = 'OH'. This criteria targets the tblAddress table, but does it have to be in the WHERE statement?

Consider now:

SELECT * FROM tblCustomer c
INNER JOIN tblAddress a ON c.CustomerID = a.CustomerID AND a.State = 'OH'
LEFT OUTER JOIN tblPhones p ON p.CustomerID = a.CustomerID
WHERE p.PhoneNumber IS NULL

By moving the State criteria to the join the following happens. tblCustomer still has 1000 customers, but now the inner join and the State = 'OH' criteria with tblAddress now restricts the data set to 10 instead of 300. Again this query is much better because we pull much less data up front. By moving what criteria you can out of the where clause you will greatly decrease query time.

III) Do you really need to SELECT * from those tables?

Many developers will write SELECT * on select queries when they really need only a few fields from the tables they are querying. If you limit SELECTs to just that data you want to retrieve, you can increase your query's performance, particularly when you query tables whit a lot of columns.

So to recap: 
-Inner Joins before Outer Joins.
-Move as much of your queries criteria to the joins as possible.
-No SELECT * if you can help it.

Simple tricks, but applying them can make a big difference.

My thanks to my friends Alan Huffman and John Anderson who have been unstinting with SQL knowledge.  Check out Alan's Blog Southern BITS of Technical Goodness.

Mono Develop - a GTK IDE for .NET development

I did a post a couple of days ago on SharpDevelop, an alternitive IDE for c# development on Windows.  Alan responded to the post by mentioning Mono Develop, an IDE for .NET Linux development using the GNOME tool kit.

Sounded like a greate idea so here's a link to the Mono Develop site.  I was able to use Mono Develop version 1.0 while at Vanderbilt.  It was a bit twitchy, but work well for a version 1.0 release.  I hear that the 2.2 version works well and is usable on Linux, Mac and Windows.  I plan to try a project using Mono Develop.

Monday, February 22, 2010

Design Patterns for the Pattern Challenged: The Repository Pattern

Martin Fowler writes:
"A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes. Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers."

Of all the design patterns I have tried to use at work I like the repository pattern the best, perhaps because I had the chance to work on an application that used the repository pattern for a recent work project.

The project I was working on was an typical of those may of us face on a day to day basis. A former colleague had done some work on a WPF-based scripting engine prototype that was to be used in a phone survey project that project tanked, but it was thought that the scripting prototype would be useful in a big compliance project I was leading. The original developer was no longer available, so it was up to me to re-factor the code for the new project.

I faced of number interesting challenges on this project. One was that I had to wrap the engine in a COM+ wrapper for use by an older VB6 type project. Ugly (I hate COM), but pretty straight forward.

Another issue was that the engine was done using WPF and my WPF experience is (was) minimal at best. I had hoped that I could just use what had already been done, but while re-factoring I found out that the WPF part of the prototype was not as complete as I had hoped, so I had an interesting hack and slash session with the WPF code, changing it to be more robust and functional. This was an interesting journey in and of itself that I'll write about later.

The next issue I encountered (and the one this post is about) is that the original project's CRUD stored script responses in a snow-flake scheme SQL database that at the time was thought best for the survey project the engine was originally designed for. I found that the original scheme was over-kill for what I needed, so I designed a straight three table relational scheme to hold the script responses that I thought much handier for what I needed for this project.

Fortunately for me the original developer on the project had the foresight to use a repository pattern for the project (he used other patterns as well, but more on that later), implemented using interfaces. Its use made it very easy for me to replace the original CRUD class, with one of my own. All I had to do was write my class, implementing the existing interfaces and about 5 minutes later I was done. In my implementation I used LINQ to SQL, someone else could come in behind me and use Entity Framework & SQL, another could use ADO.NET or someone could write a class that implements an Oracle back end.

This was a great real-world example of a bunch of things that we all hear about, but sometimes don't appreciate until we see it for real. For instance the use of a repository pattern made it much easier for me to replace the snow flake scheme with a better one. The big lesson I took out of this is that the use of design patterns allows developers to develop in a much more disciplined and thus more maintainable way.

Sunday, February 21, 2010

Sharp Develop .NET IDE

If you've been looking for an alternative IDE for .NET development try Sharp Develop.  I used it at Vanderbilt on a few projects back when it was version 2.2 for the 2.0 version of the framework.

Now version 3.2 is out for frameworks 2.0, 3.0 and 3.5 and support a whole raft of options including C# Code Completion, Windows Forms Designer, WPF and F#.  Check out the Sharp Develop's Info page.  I'll be attempting a personal WFP project with Sharp Develop and will post more on using Sharp Develop later.