Friday, July 31, 2009

Object Models and Relational Joins -- Endless Confusion

Check out this list of questions from Stack Overflow: [Django] join.

These are all folks trying to do joins or outer joins even though they have objects fetched through the ORM.

How does this confusion arise? Easy. Folks work with SQL as if the relational world-view is Important and Universal. It isn't. SQL isn't even a programming language, per se.

Here's the important thing for Django developers to know: SQL is a Hack; Leave it Behind.

The bad news is that all those years spent mastering the ins and outs of the SELECT statement doesn't have as much enduring value as I'd hoped it would have. [Yes, I was a DBA in Ingres and Oracle. I know my SQL.]

The good news is that Object Navigation replaces much of the hideousness of SQL. To an extent. Let's look at some cases.

Joins in General

SQL SELECT statements are an algebraic specification of a result set. The database is free to use any algorithm to build the required set.

SQL imposes the Join hack because SQL is a completely consistent set algebra system. A simple SELECT returns a row-column set of data. A join between tables has to construct a fake row-column set so that everything is consistent.

A Join is nothing more than navigation from an object to associated objects. In OO world, this is simply object containment; the navigation is simply the name of a related object. Nothing more.

Master-Detail (1:m) Joins

A master-detail join in SQL works with a foreign key reference on the children.

In Django, this has to be declared in a SQL-friendly way so that the ORM will work.

class Master( models.Model ):

class Detail( models.Model ):
master= models.ForeignKey( Master )

The "Join" query is simply this. The "detail_set" name is deduced by Django from the class that contains the foreign key.

for m in Master.objects.filter():
process m
for d in m.detail_set.all():
process d

"But wait!" the SQL purist cries, "isn't that inefficient?" The answer is "rarely". It's possible that the RDBMS, doing a "merge-join" algorithm to build the entire result set might be quicker than this.

As practical matter, however, the rest of the web transaction -- including the painfully slow download -- will dominate the timeline.

Association (m:m) Joins

An association in SQL requires an intermediate table to carry the combinations of foreign keys.

In Django, this has to be declared in a SQL-friendly way so that the ORM will work.

class This( models.Model ):

class That( models.Model ):
these = models.ManyToManyField( This )

The navigation, however, is simply following the relationships. There's no complicated SQL join required.

for this in This.objects.filter():
for that in this.that_set.all():
process this and that

Here's the other side of the navigation.

for that in That.objects.filter():
for this in that.these:
process this and that

Outer Joins

An Outer Join is a "Join with Null for Missing Relationships". It's navigation with an if-statement or an exception clause.

for that in That.objects.filter():
try:
this = that.this_set.get()
except This.DoesNotExist:
this = None
process this and that

There isn't any "join" in object-oriented programming. The ORM layer removes the need.