Bio and Publications

Thursday, April 29, 2010

NULL Confusion

The SQL database offers a domain-independent NULL value. This is a terrible thing, and should be treated with a depth of respect and fear.

Before using NULL values in a database, read things like "Null Values in Fuzzy Datasets" and "Null Values Revisited in Prospect of Data Integration".

See this question, and the answers -- people are very, very confused by NULL. The issue is that the SQL NULL conflates several separate and unrelated concepts.
  • Not Available Now. We expect that the data will be discovered later. That is, this NULL is a work-around for a process issue.
  • Not Applicable (or Optional). This means that disjoint subtypes have been unified into a single table with optional or not-applicable attributes. This is an optimization choice. This also be due to state changes: the attribite is not used in one state, but will be used in another state.
With two meanings for a single value, hilarity always ensues.

Further Confusion

The NULL value doesn't participate in comparisons or indexes. This is -- apparently -- shocking to some people. Here's a nice summary in "Why NULL never compares false to anything in SQL". Also, some notes for Oracle users in "Understand Oracle null values" and "Oracle conditions and how they handle NULL"

Because of this "NULL doesn't compare" problem, people get baffled by the use of NVL (or IFNULL) functions.

The Rules of NULL

The first rule of NULL is "Don't." Don't define a data model that depends on nullability. Define a model where each class is distinct and all the attributes are required. This will lead to a number of focused, distinct class definitions. A large number. Get over it. Don't pre-optimize a design to reduce your number of classes or tables.

Once you have a model that makes sense -- one that you can prove will work -- one that precisely matches the semantics of your problem -- you can optimize. But don't start out by pre-optimizing or taking "obvious" short-cuts.

The second rule of NULL is "Don't conflate Availability with Applicability." If you have data that is not available, you may have serious issues in the process you're trying to automate. Often, this is because you have multiple views of a single entity. If you have data that's not applicable, you've done your design wrong -- you put disjoint subtypes into a single class definition. Factor them (for now) into class definitions where all attributes are required.

If you have inapplicable or unavailable data, you must factor things into pieces where all attributes are required. You will then find that your "thing with optional attributes" is either "thing that changes state" or "thing with multiple subsets of attributes that must be joined together." Later, you can think about optimizing.

"But," you object, "it's a single entity, I can't meaningfully decompose it."

Consider the root causes for missing data before you take this position too seriously.

Let's take a concrete example. We're doing econometric analysis of businesses. We have a "Business" entity that has various kinds of measurements from various sources. Not all sources of information have data on all businesses. There's a lot of "not available" data. Also, depending on the type of business, there may be a certain amount of "not applicable" data. (For example, not-for-profit corporations, privately held companies and public companies all have different kinds of reporting requirements.)

What you have is a core "Business Name" entity with the minimal information. Name; maybe other identifying information. But often, the name is all you know.

You have a "Business Address" entity with mailing address. Small businesses may have only one of these. Large businesses will have more than one.

You have "Econometric Scoring From Source X" which may have a normalized name, a different address and other scores.

Conceptually, this is a single entity, a business. But from three different points of view. Initially, this is not a single class definition. It may be optimized into a single table with NULLs. That's a lousy design, but very popular. There are multiple addresses; scores change over time. Implementing this as a "single-table-with-nulls" seems to be a bad policy.

The third rule of NULL is "Only as an optimization." If you can prove that a particular join is nearly one-to-one, and you can prove that the cost of the join is too high, then you can consider pre-joining and using NULLs.

Common Mistakes

There are two common NULL-join mistakes: optional joins and date ranges.

One common mistake is attempting to join on an "optional" attribute. You wind up with NVL functions in the WHERE clause. RED ALERT.

If you have NVL functions in a WHERE clause then (1) you've defeated all indexing and (2) you've reinvented the wheel.

An NVL to a where clause is stand-in for a UNION. When you think you need an NVL, you've got two subsets that you're tying to put together: the subset with a non-NULL value and the subset with a NULL value. This is a UNION, and the UNION will probably be faster than your NVL construct. (If it isn't, good on you for benchmarking.)

Another common mistake is date ranges. Folks insist on having "open-ended" date ranges where the "end-of-time" is represented as a NULL. RED ALERT. NULL already means not applicable or not available. The end of time is both applicable and available. Don't add a new meaning to NULL.

Coding the end of time as NULL is simply wrong. The end of time (for now) is 12-21-2112. It's an easy date to remember. It's a cooler date in Europe than the US.

"But," the deeply argumentative claim, "I can't have my application dependent on a mystery constant like that". Lighten up. First, your app won't be running in 2112. Second, your app is full of mystery constants like that. You've got codes, your users have codes they think are important. Your localtime offset from GMT is a mystery constant. The start and end dates for daylight savings time are mystery constants. Please. You have lots of mystery constants. If you want to be "transparent", make it a configuration parameter.

Just code the end-of-time as a real date not a NULL and use ordinary date-range logic with no silly NVL business.

Tuesday, April 27, 2010

Yet More Praise for Unit Tests

I can't say enough good things about TDD.

But I'll try.

Due to an epic failure to read the documentation (this, specifically) I couldn't get our RESTful web services to work in Apache.

The entire application system has pretty good test coverage. I use the Python unittest to do integration testing. A test module spins up a Django test server; each TestCase uses the RESTful API library access the web servers through a variety of use cases.

However. This integration isn't done through Apache and mod_wsgi. It's done using Django's stand-alone testserver capability.

As I noted recently, Apache doesn't like to give up the HTTP Authorization header. So, the real deployment on our real servers didn't really work.

The Blame Game

At this point there are lots of things we can blame. Let's start blaming the process.
  1. TDD didn't help. By now it should be obvious that TDD is a complete waste of time because it didn't uncover this obvious integration issue. There's no justification for TDD.
  2. The Unit Testing framework didn't help. It's a completely blown unit. Unit testing is oversold as a technology.
  3. Reliance on "testing" is stupid. There's no point in even attempting to "test" software, since it still broke when we tried to deploy it. Testing simply doesn't uncover enough problems.
Clearly, we need a Bold New Process to solve and prevent problems like this.

Seriously

Search Stack Overflow for "Justification of TDD" or "ROI of Unit Testing" and those kinds of loaded questions and you'll find folks that are angry that software development is hard and TDD or Unit Testing or a slick IDE or a Dynamic Language or REST or SOAP or something didn't make software easy.

There is no Pixie Dust. You've been told. Stop searching for it. Software is hard. Unit Testing helps, but doesn't make it less hard.

Unit Testing to the Rescue

Our code coverage is -- at best -- middlin'. I don't have counts, nor do I actually care what the lines of code number is. Code coverage can devolve to numerosity. The method function coverage and use case coverage is more interesting. A "logic path coverage" might be helpful. But I'm sure our coverage is far from complete.

So there we were.
  1. Hundreds of unit tests pass.
  2. A suite of a half-dozen "integration" scripts (over a dozen TestCases) pass.
  3. Real Apache deployment fails because I couldn't figure out how to get mod_wsgi to pass the HTTP Authorization header. Even though it's clearly and simply documented. [I was busy focusing on Apache; mod_wsgi solves the problem handily.]
What I did was copy a page from AWS and put the digested authentication information in a query string. In one sense, this is a huge change to the API's -- it's visible. In another sense this is a minor tweak to the application.

The RESTful web services all rely on an authenticator object. The change amounted to a new subclass of this authenticator. Plus some refactoring to locate the digest in the query string. This is a tightly focused change in authentication and the client library. About two days of work to subclass and refactor the auth.rest module.

Success Factors

Because of TDD and a suite of unit tests, many things went really, really well.
  1. I could extend the test script for the auth.rest module to include the new authentication-via-query-string mechanism. Having tests that failed made is really easy to refactor and subclass until the tests passed. Then I could refactor some more to simplify the resulting modules.
  2. I could rerun the unittest suite, including the various "integration" tests (tests that had everything by Apache) to be sure everything still worked. Believe it or not, there were actual problems uncovered by this. Specifically, some tests didn't properly use the web services API library. The library had changed, but was mostly backwards compatible, so the tests had continued to work. The latest round of changes broke backwards compatibility, and some tests now failed.
  3. Despair did not set in. There were issues: sales folks were in total panic because the whole "house of cards" architecture had collapsed. A working test suite makes a compelling case that the application -- generally -- is still sound. We're just stumbling on an Apache deployment issue. In one sense, it's a "show stopper", but in another sense it's just a Visible But Minor (VBM™) hurdle.

Friday, April 23, 2010

REST and HTTP Digest Authentication

It seems so simple: use the HTTP Digest Authorization with the Quality of Protection set to "auth".

It's an easy algorithm. A nonce that encodes a timestamp can be used to be sure no one is attempting to cache credentials. It's potentially very, very nice.

Except for one thing: Apache.

Apache absorbs the Authorization header. And that's the end of that. It seems so simple, but I think I've been burned by it twice, now. I write unit tests that work with simplified Python wsgiref (or similar) servers. And I believe that those unit tests are equivalent to integration tests.

Ouch.

There's another reason why HTTP Digest authentication for RESTful services is a poor idea.
It involves too much traffic. HTTP authentication is usually a two-step dance to establish a session. Two steps in one too many, and RESTful services don't usually have any kind of session.


The comments on this post are almost as helpful as the post itself.

The three points are straight-forward.
  1. Use SSL. Always.
  2. Multiple Key/Secret credentials. Read this as username/password if that's helpful. We store hashes of "username:realm:password" as part of a RFC 2167 Digest Authentication. We plan to continue to use those hashes. This is a bit touchy, but we think we can handle this by a slight change to our user profile table.
  3. The "signed query" principle requires some thought. We don't make heavy use of query strings. Indeed, we make almost no use of the query strings. So the hand-wringing over this is a bit silly. We simply ignore any query string when signing the request.
I just wish I did integration testing with Apache sooner, not later. Sigh.

Tuesday, April 20, 2010

Working Definitions of Complexity

Software developers get so used to their Culture of Complexity, they hardly notice it.

See Asshole-Driven Development for more thoughts on this. The comments add lots and lots of examples of dysfunctional development. Many of these are additional examples of complexity run amok.

My personal addition is Next Year's Dollars are Cheaper (NYDC): next year's dollars are less valuable than this year's dollars. So technical debt can be accrued without any long-term consequences. Dumb, bad design can be forced into production because maintenance is cheaper than getting something done by a fantasy-based deadline date. Never mind the fact that maintenance goes on -- effectively -- forever, and technical debt accrues real cost at an exponential rate. Complexity is free? Apparently so.

Recently I heard of the "DIMY" development. DIMY is Do It My Way. The specific war story was a PL/SQL stored procedure that was somehow "too complex" because all it did was call 7 other stored procedures. The business process had 7 steps; the parallelism between procedure and use case was an important part of the design. Yet, since some folks would prefer a Monolithic Stored Procedure (MSP), they saw fit to complain.

Asserting that a MSP is "less complex" is a mirror image our normal understanding of complexity creating cost. It's a mirror-image because the debits and credits are reversed. In the DIMY world, measurable complexity is valued as an asset and real simplicity is viewed as a cost.

Working Definitions

Based in the war story, we can identify several aspects of this working definition of complexity.

First, it appears that a monolithic piece of software (no matter how poorly it matches the use case) is "less complex" than a relatively simple piece of software that better matches the use case. So software that actually matches the use case is "complex".

It also appears that a simple count of stored procedure names is a measure of complexity. So any effort at doing any meaningful, "high-level" chunking of meaning is accused of creating complexity. So chunking is "complex".

[I am forced to agree that "more names" is a larger number; I can't agree that more names is more complex, because I find that chunking works. Note that refusing to engage in mental chunking is absurd. Claiming that multiple named stored procedures "obscures the details" is silly. Why stop at named procedures? Why not claim that PL/SQL -- by it's very nature -- obscures the details? Why not claim that an RDBMS obscures the details? Why not claim that the OS API's obscure details? Why not claim that compiled languages obscure the details? Chunking is essential.]

Finally, it appears that real measure of complexity, like Cyclomatic Complexity are irrelevant. So a monolith, with lots of loops and ifs is somehow less "complex" and more desirable.

Perfect Code

Clearly, then, for some folks, high quality code involves (1) no match against the use case, (2) a single name, and (3) no limit on the loops and if-statements. In order to achieve this, we need a very simple use case (real simplicity -- low cyclomatic complexity -- a sequence of decision-free steps) for which we can write an immense, possibly irrelevant pile of code.

What's wrong with the MSP?

Given a monolithic piece of code that doesn't match the use case sequence of steps, how could we construct unit tests? I don't see how. Since we can't decompose the problem into meaningful chunks, we can't test the thing in pieces. All we can do is write overall end-to-end functional tests. And hope.

Given this MSP, how would we debug problems? I don't see how. We'd be confronted with "it's broken" almost every time something went wrong. Pin-pointing the root cause seems like it would be impossible.

DIMY development -- and the associated complexity -- does mean one thing. It means job security: no one will ever be able to understand, maintain or adapt this software. Write once, maintain forever. If you're patient, you have a job for life. At some point, managers will realize it's too expensive to maintain and -- because you're the only expert -- you can rewrite it, continuing the cycle of complexity.

Wednesday, April 14, 2010

Ways to Complicate Use Case Analysis

I sat through a great use case analysis session recently.

"Great" because I saw lots of ways to derail a simple process. Eventually, we did identify a couple of actors and a couple of use cases. But it took hours and hours.

Bonus: this was the third go-round on these exact use cases.

Requirements Gathering

The first go-round of requirements gathering was a conference call. We produced some nice notes. Very good stuff.

The various whiteboarding tools available with Skype work pretty well. We could sketch stuff, and collect notes, and digest the conversation down to a tidy document.

The actual work went pretty well. Stuff got built. The principal users need -- of course -- to review the preliminary stuff. There's a first sprint to build stuff, followed by some chance to comment, followed by a second sprint to finalize.
Focus was lost two ways. One, the technical folks were derailed by other, more important projects. Customer pilots, reinstallation on a new host, and an unrelated project threw down trump cards. Second -- and more important -- the principal users simply could not find time or interest to review the preliminary stuff.

Trying Again

For no sensible reason, we had a second go-round of requirements gathering. The core problem is that the users simply won't take the time to try something out on their own. They're sales folks, without an actual customer in the room, they seem incapable of doing anything. This kind of world view takes some getting used to.

Instead of previewing what was available, they insisted on more requirements gathering. What we got was a random document that purported to describe what the actors might do. It was intended to repeat the initial phone call with more focus. Instead it had lots of "We'll need to talk about this" parenthetical comments.

In short, it was impossible for them to even set down a coherent idea on paper. All they could do was talk about it. There was no alternative to a conversation.

Round Three

In order to make some progress, our Adobe FLEX developers were brought in to create something a little snappier than the simplistic HTML interface we had been working on. We redid the entire requirements gathering for them.

The principle users -- sales folks -- did not want any of the previous requirements gathering results brought in. We had to have a "conversation", repeating the entire previous effort from the ground up.

And -- of course -- all of the previous dead-ends, bad ideas, logical impossibilities and business impossibilities had to be repeated yet again. Data that's never been available was still spoken of as "required". The conversation on why the data cannot possibly exist had to be rehashed.

Derailing

What's derailing the process is simple. The sales folks cannot work independently. Each interaction must be a hands-on, guided tour of the software in which the sales folks say random things that must be ignored.

At some point in the future, there's a remote possibility that someone will login on their own and actually run the demo that's been in place for months.

As developers, we have been remiss in not catering to their learning style. They cannot think without talking, and they cannot take action unless they're influencing someone. Asking them to test the demo site is unproductive because they simply wont. Asking them for "comments" can be troublesome because they're job is influence, not simply provide a simplified "ok/not ok" feedback on specific features of the implementation of their use case.

Broad Not Deep

Additionally, we have folks that keep trying to define the requirements in broad, sweeping terms. They're uncomfortable with an end-to-end use case. At each step in the interaction they want to define all the possible future courses of events and interactions and consequences of each alternative course.

Folks with the big-picture view have a hard time writing a use case that describes a task from beginning to the end. Even basic concepts like "Creating Business Value" can be elusive since the possibilities are limitless.

As developers, we were successful at focusing down on a complete use case. With some effort we got from the user's initial interaction to the final bit of business value.

Monday, April 12, 2010

Learning Styles -- The Astonishment Response

We're not really talking about "Learning Styles" as much as "Denial Styles". This is a list of responses to "Astonishment" I've seen.

We're not talking about the Kübler-Ross model of grief, although this is similar.
However, the response to astonishment isn't a "progression" toward acceptance. Some folks simply don't like to learn and are perfectly capable of arguing down the facts because they don't fit with assumptions and preconceived notions.

When faced with new information, some folks seem to have a consistent response to astonishment. Other folks seem to jump around among a few preferred responses. Additionally there are people who seem to prefer to escalate things to a crisis level because learning seems to require adrenaline.

Oh. Classic acceptance. Many folks start here; which is pleasant. It saves a lot of email traffic. When astonished, they assimilate the information without really fighting against it.

That Can't Be True. Classic denial. It's surprising how often this happens. Even when confronted with facts supplied by the learner themself.

Example. The DBA says stored procedures are a maintenance problem. You say, "Correct, perhaps they shouldn't be used to heavily".

The DBA asks "Why reduce dependence on stored procedures?" You say that, amongst other things, "they're a maintenance nightmare."

And the DBA says, "That can't be true; it's just a management problem." WTF -- Wasn't That Funny -- the DBA is going to deny their own facts in order to avoid learning something knew.

I Wasn't Told. This is a kind of grudging, negotiated acceptance. "What you say about bubble sort being inefficient may be true, but I wasn't told." Okay. You weren't told. Does that mean that I have to email all of Don Knuth to you so that you will have been "told"?

I'll have to see it. This is really just a basic denial wrapped in the terms of settlement. In short, the learner is saying, "I still disagree with all your facts." I'm not sure what "I'll have to see it" means when we have working implementations of something "new" or "different".

Example. A: "RESTful web services are simpler." B: "No." A: "No SOAP, no WSDL; seems simpler." B: "Perhaps, but I'll have to see it." See what? How can you "see" the absence of complex WSDL?

This project is out of control. This is a somewhere near grudging acceptance. It might also be a form of reneging or repudiation of acceptance. It's hard to say.

Example. Manager: "The Ontology has thousands of objects with dozens of properties and the SPARQL processing is slow." Architect. "Replace it with a relational database derived from the ontology." Manager: "Okay".

Four Weeks Later. Manager: "This Project is Out of Control."

Right. We're making a disruptive change to the architecture. What did you expect? Non-disruptive change? How is it change if it doesn't disrupt something?

Does Everyone Know This? This is a form of "I wasn't told". It's my favorite because it projects one's own knowledge-gathering onto a mysterious "everyone". I'm not sure why some folks say this. To me, it seems a pretty bold statement about the mental states of other folks on the team.

That's Non-Standard. More properly this should be "That's atypical" or "That's unconventional". This is another negotiated, grudging acceptance. But it's a pretty complex deal. The first part is to establish a convention. Sometimes a legacy usage needs to be elevated to "typical" or "conventional"; other times legacy usage already is conventional. The second part is to realize that the new thing is different from the convention. The third part -- which is subtle -- is to deprecate something new because it is unconventional.

Example. Architect: "You should use a HashMap for those dimensional conformance lookups." Programmer: "Not everyone understands those fancy collection classes, so we use primitive arrays." Architect: "That's amazingly slow. It's less code to build and lookup a HashMap, and it runs faster."

Programmer: "That would be non-standard". Architect: "There's no applicable ISO standard. Perhaps you mean 'unconventional'." Programmer: "Right, unconventional. And we can't change now because it would disrupt the established code base."

Architect: "It will be less code and run faster." Programmer: "I'll have to see it."

Doesn't That Contradict Something? This is best nit-picky form of denial ever. Step one is to analyze each word of the suggested change; in some cases, using the level of care appropriate to studying the Talmud. Step two is to locate something that could be construed as contradictory. The third step is to deprecate the new thing because it can be linked to something that can be seen as contradictory.

Architect: "Can we add some formal assert statements in the tricky actuarial scoring algorithm. It involves non-obvious assumptions about NULL's and ranges of values." Programmer: "No. That contradicts your earlier advice to unit test all those corner and edge cases."

Architect: "Contradict? Perhaps you mean it's redundant." Programmer: "No, it's clearly contradictory; one never needs both assertions and unit tests. You demanded unit tests, that means that assertions are a contradiction."

Summary

Other than patience, it's hard to provide any other advice on how to work through these things. Mostly, these are fact-free positions. In some cases, even facts don't help the learning process.

I think the only way to cope with a fundamental refusal to learn is to ask what it takes to convince them. In many cases, the answer amounts to "Do the entire implementation two ways and then micro-examine each nuance of performance, maintainability, adaptability and cost to the organization over a period of a decade before I'll consider your worthless opinion."

I remember once being asked -- seriously -- how I can possibly claim one implementation is higher performance than another. The question was asked as if "measurement" didn't apply to software performance. At the time, I couldn't figure out why "measurement" wasn't the obvious fact gathering technique. Now I realize that they were simply refusing to learn and didn't care about evidence; they simply didn't want to change to a more efficient implementation.

Friday, April 9, 2010

iPad Thoughts -- Fashion Accessory?

From a Blog that's inside a company's firewall, so this had to be heavily edited.
"The instant ON is a relief. The full page touch screen works just like on the iPhone - only better. Web pages look great.. Photographs and Movies are fabulous. The screen resolution is fantastic. Sharing pictures makes it clear that the photo album is history. Tough times for Kindle. Email - much better than on the Blackberry. The things we like on the PDAs are all more attractive - and more usable! Almost like on a laptop."
Also.
"I did not have an easy way to view Excel & Powerpoint. 3G is not available for another month. ... No Adobe Flash. For some, the one big 'flaw' will be the lack of a 'file system'."
Finally, emphasis mine.
"The iPad is not a big leap, it is just a step, a big iTouch. But this is the last step that brings a whole new vision home. While not quite ready for Enterprise deployment, it gives us time to get going. And this may be the Tablet that makes it acceptable for men to carry handbags"
Okay. Time to start shopping for a nice Timbuktu messenger bag.

Wednesday, April 7, 2010

Fancy Literate Programming

My bias is toward "printable" documents. I like the idea of an HTML document that is directly printable. I've used tools like Flying Saucer (and appropriate CSS) to guarantee printability.

I like using ReStructured Text to create HTML and LaTeX that match precisely.

When I think of Literate Programming, I'm biased toward print.

However, HTML has a lot of power. Taking off the blinkers for a moment, one can see that rich HTML and Javascript may be a really workable approach.

Take a look at the Literate Molly Module: another tool for Literate Programming. This is an example of using rich HTML as a vehicle for literate programming.

Monday, April 5, 2010

Getting Started Creating Web Pages

Got this question recently.
I’m looking for an HTML editor that fits into my price range (free of course). I don’t need to do anything fancy, just vanilla HTML to run on an Apache server ..., and maybe some PHP down the line. Can you recommend any open source or shareware software that would run on Windows?
What to do?

First, civilized folks don’t edit HTML any more. That’s so 1999.

You have a spectrum of choices if you want to try and edit HTML.
  • General-purpose text editors. Good ones do HTML syntax coloring. This is the hardy, forge-through-the-forest way to go. Raw text editing. Like when we were kids. http://en.wikipedia.org/wiki/List_of_text_editors. In Windows world, I use Notepad++.
  • HTML-specific editors. http://en.wikipedia.org/wiki/List_of_HTML_editors. Note that WYSIWYG HTML Editing is more trouble than you’d believe possible. It’s always fun for the first few months, but then you try to do something that confuses the GUI interface and you wind up with an entire paragraph in italics and can’t figure out why. Or you want to move a punctuation outside a link and discover that the editor just can’t figure out where the tag is supposed to fall and puts everything inside it. Most of us do not try to use WYSIWYG HTML editors because it slowly becomes annoying once you get beyond the trivial basics.
  • IDE’s. To produce HTML sensibly, you have to also write .CSS style sheets, and you often have a number of related pages. Essentially, a “project”. An IDE is usually a better choice than an editor. All the good IDE’s are free: Eclipse, NetBeans and Komodo Edit. I use ActiveState Komodo Edit heavily.
While NetBeans or Komodo Edit seems like overkill, it will (eventually) pay out as you move into developing more than static HTML pages.

Better Than HTML

Instead of creating HTML, many of us use “Lightweight Markup” which is much, much easier to cope with and simple tools to produce HTML from the markup. http://en.wikipedia.org/wiki/Lightweight_markup_language

I use reStructuredText instead of HTML. I use the DocUtils project, which has an rst2html.py tool that converts my RST into HTML for me. I also use rst2s5.py to create power-point-like presentations from my reStructuredText. If you want to see the power of RST, you can look at my personal site and my books: and. 100% RST. No manual HTML anywhere. I use Sphinx to create really complex docments like the books.

For some tasks, I use HTML templates and simple scripts to process data and create static HTML from the data. You’d be surprised how effective this is. Few things require up-to-the-second web applications. Many things can be done as nightly batch programs that emit static HTML and FTP the HTML up to the web page. No PHP.

Application Development

For web development, PHP is fine. It will – before long – create holes in your head because it’s so badly thought out. But for getting started, it’s fun. Real companies (like Google) don’t waste their time with it because of the numerous problems PHP causes.

“Problems?” you say. “What problems?”

PHP’s world view (HTML + code in a single package) is a terrible architecture. It’s horribly slow and leads to very muddled, inflexible designs. Everyone who tries to make a global change to their site's “look and feel” finds that PHP is inflexible and a regrettable platform. Even folks who simply want consistency among several different pages within their site find that the PHP world view is more headache than solution.

But it’s fun when you first build a site that works.

Frameworks

Generally, most folks find that a “framework” is absolutely essential for debugging, consistency and separating Content, Processing and Presentation. Even a simple Blog or Forum or Visitor Registration has separate Content, Processing and Presentation; PHP muddles these. A framework can help unmuddle them.

I use Django as framework and Python as programming language. Your hosting site may not support this, in which case you may be in trouble.

The Web Frameworks list on Wikipedia is good. Zend and CodeIgniter are highly recommended in places like StackOverflow. However, here's a good Django vs. PHP comparison: The Onion Uses Django, And Why It Matters To Us.

"Because
Cleaner. Much cleaner. Proper unit testing. Real reusable components across applications. An ORM rather than a just a series of functional query helpers...."

Summary
  1. Get an IDE to edit your pages. Komodo Edit.
  2. Consider using RST and tools instead of raw HTML. Installing Python + DocUtils and using rst2html.py is easier than learning HTML.
  3. Try to avoid PHP’s numerous pitfalls; ideally by avoiding PHP. Use Django + Python and create a real application that clearly separates the content (data model) from processing (view functions) from presentation (HTML templates)

Thursday, April 1, 2010

The Final Design Review

Today, we're reviewing the final and only code in the application. It's just that simple. We'll start with the data model.

CREATE TABLE STUFF(
COLUMN1 TEXT,
COLUMN2 TEXT,
COLUMN3 TEXT
);

As you can see from the enclosed table design, we have generalized the general triple-store to make it more general by removing all type restrictions on the RDF triple.

This can be used to construct any relations by the convention of having COLUMN1 be the database, schema and table, separated with dots. COLUMN2 is the column or attribute name. COLUMN3 is -- again, by convention only -- the target data type (integer, string, date, etc.) and the quoted value.

Any Questions?

The stunned silence is -- I'm sure -- due to the glittering brilliance of this design. Why this hasn't been more widely used, I have no idea.

The code is equally simple. We don't need to get into the details, but we have
"a function called do_stuff. ... not to worry because the method does more than one thing. ... don't worry because the method is overloaded many different ways."
Great. That ends this design review. With this kind of obvious simplification, we don't need any more design reviews, this one covers all possible bases.

Seriously

I've seen the first discussed seriously in the context of "why don't we simply...". The answer ("it doesn't perform well") is always a surprise to people who pitch the triple-store solution to a problem they've managed to get wrong.

I recently received an email bemoaning a real code review in which someone seriously tried to put a function into production named "do_stuff".

Equally bad is this "question" on Stack Overflow. "Using table-of-contents in code?". (It's not really a question, it's a blog post in the rhetorical form of a question.) The money quote: "I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.. but some complex tasks require a lot of code".

It appears that there are programmers who have done too little maintenance and adaptation.