Bio and Publications

Monday, September 28, 2009

Duct Tape Programmers

See Joel On Software: The Duct Tape Programmer: he lauds the programmer who gets stuff done with "duct tape and WD-40".

Here's why: "Shipping is a feature. A really important feature. Your product must have it."

Dave Drake sent the link along with the following:

This "speaks of coding for the rest of us, who are not into building castles in the air, but getting the job done. Not that there is anything wrong with better design, cleaner APIs, well-defined modularity to ease the delegation of coding as well as post-delivery maintenance. But damn, I wish I had a nickel for every time I sat in a design meeting where we tried to do something the fancy way, and it broke in the middle of the development cycle, or testing, or even the builds, and always in the demos."

However

There is one set of quotes that falls somewhere on the continuum of wrong, misleading and flamebait.

"And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that."

This -- in my experience -- is wrong. For Joel or the author of the quote (Jamie Zawinski) this may be merely misleading because it was taken out of context.

It's absolutely false the customers won't complain about missing unit tests. When things don't work, customers complain. And one of the surest ways to make things actually work is to write unit tests.

I suppose that genius-level programmers don't need to test. The rest of us, however, need to write unit tests.

Unit Testing Dogma

On Stack Overflow there are some questions that illustrate the value of misinformation on unit testing. On one end, we have Zawinski (and others) who says that Unit Tests don't create enough value. On the other end we have questions that indicate the slavish adherence to some unit test process is essential.

See How to use TDD correctly to implement a numerical method? The author of the question seems to think that TDD means "decompose the problem into very small cases, write one test for each very small test, and then code for just that one case and no others." I don't know where this process came from, but it sounds like far too much work for the value created.

It's unfair to say that unit testing doesn't add value and claim that customers don't see the unit tests. They emphatically do see unit tests when they see software that works. Customers don't see unit tests in detail. They don't see dogmatic process-oriented software development.

When there are no tests, the customer sees shoddy quality. When the process (or the schedule) trumps the feature-set being delivered, the customer sees incomplete or low-quality deliverables.

Conclusion

The original blog post said -- clearly -- that gold-plated technology doesn't create any value.

The blog post also pulled out a quote that said -- incorrectly -- that unit tests doesn't create enough value.

Wednesday, September 23, 2009

Code Kata : Parse USPS ZIP3 table

Situation

The USPS ZIP codes have a multi-part structure. The first three digits are a prefix that defines a sectional center facility.

The USPS table L005 3-Digit ZIP Code Prefix Groups—SCF Sortation maps clusters of ZIP3 prefixes to Facility and State codes. The following URL has this table.


Your Job

Your job is to write a library module that does two things:

1. Read and parses this table.

2. Support ZIP-code lookup (ZIP3, ZIP, ZIP+4) to return SCF and State information.

Some Notes

Finding and parsing the table is often done in Python with components like Beautiful Soup. Equivalents aren't available in all languages. You might want to copy and paste this table into a spreadsheet application, and save it as a CSV file, which is much easier to work with than HTML.

There's a regular format to the ZIP3 ranges that makes parsing them relatively simple.

The SCF names, however, have two different formats. Some have names that begin with SCF. Others have names that don't begin with SCF. Be careful to handle each version correctly.

Code Kata : Merge Changes

The Situation

A co-worker has mistakenly cloned a directory tree rather than link to it. Then they made some number of changes to files in that directory.

Your Job

Your job is to compute a directory-level difference between an official copy and the changes they made. Sadly, you can't trivially rely on using Subversion for this. You're going to have to write your own differ.

The difference report should show the following kinds of information.

1. Baseline files unchanged in the clone.
2. Cloned files which are new and don't exist in the baseline.
3. Cloned files which are changed and newer than the baseline.

You'll need to skip certain directories. They're either working files or are ignored for other reason.

You'll need to skip certain file extensions. Things like .pyc or .class files have date-stamps that don't indicate a real difference.

Ultimately, you'll produce two things.

1. A report to show what was changed.
2. A script that will copy the changes from the clone back into the master directory.

Some Notes

Python has several modules that help with doing directory and file comparison.

In non-Python environments, you may have to rely on system utilities like diff or cmp.

This is best built incrementally, creating the report first. Then handle exceptions. Then do the copy.

Sunday, September 20, 2009

Innovation and Outsourcing

Good stuff in ComputerWorld: Partnerships can Go Too Far.

"Consider vendor innovation. As companies become large and entrenched, they typically become more risk-averse and less creative, often rejecting ideas that challenge conventional wisdom."

This is really only half the story.

First Things First

Programming is hard -- really hard. Read EWD 316, chapter 2. By extension, most of IT is saddled with really, really complex and difficult problems.

"As a result of its extreme power, both the amount of information playing a role in the computations as well as the number of operations performed in the course of a computation, escape our unaided imagination by several orders of magnitude. Due to the limited size of our skull we are absolutely unable to visualize to any appreciable degree of detail what we are going to set in motion, and programming thereby comes an activity facing us with conceptual problems that have risen far, far above the original level of triviality."

Given that IT is hard, it therefore entails either some risk of failure or considerable cost to avoid failure. It also involves an -- often unknown -- amount of learning.

Before writing software, we really do need to learn the language, tools, architecture and components we're going to use. Not a 1-week introduction, but a real project with real quality reviews. Sometimes two projects are required to ferret out mistakes.

Also, before writing software, we really do need to understand the problem. Sadly, many business problems are workarounds to bad software. Leaving us with many alternative solutions that are all equally bad and don't address the root cause problem.

No-Value Features

Programmers will often pursue no-value features that are part of the language, tools, components or architecture. This drives up cost and risk for no value.

Business Short-Sightedness

The compounding problem is a short-sighted business impetus toward delivering something that mostly works as quickly as possible. Often, business folks buy into the no-value features, and overlook the real problem that we're supposed to be solving.

Sigh.

The Result: Stifling

The result of (a) inherent complexity, (b) no-value features and (c) short-sighted buyers is that IT management finds ways to stifle all IT innovation.

In effect, most companies outsource innovation. They hope that their vendors will provide something new, different and helpful. The IT organization isn't allowed to invest in the learning or take the risks necessary to innovate.

The ComputerWorld article points out that some companies then put Preferred Supplier Plans in place which further stifle innovation by narrowing the field of vendors to only the largest and least innovative.

Wednesday, September 16, 2009

Fedora 11 and Python 2.6

Upgraded a VM to Fedora 11 recently.

This -- it turns out -- comes with Python 2.6 installed.

It is, however, an incomplete build. To do anything, I had to install a some additional Python packages. Specifically, the "libraries and header files needed for Python development". Also, IIRC, the tkinter package isn't present by default.

Once I had the development package installed, I could add setuptools. After that, it's a sequence of easy_install steps and we were up and running.

I've started running our unit test suite with python -3 to capture all of the DeprecationWarnings. So far, there aren't many and they aren't show-stoppers. In one project we have some has_key methods and a use of urllib that needs to be replaced.

It's very, very nice to have a short, specific list of Python 3 compatibility issues to look out for. We're not going to use Python 3 any time soon, but it's nice to be able to solve the problems in advance.

Tuesday, September 15, 2009

The world is multidimensional? Really?

I cannot believe that people still consider top-down, uni-dimensional, taxonomic hierarchies useful.

This Stack Overflow question (REST: How to Create a Resource That Depends on Three or More Resources of Different Types?) repeats an assumption. Essentially the confusion comes from assuming that "URI's map directly to a hierarchy".

I think it's over-exposure to the Windows file system where hard links are a rarity.

Perhaps it's also from over-exposure to hierarchical site-maps that simply repeat the menu structure without adding information.

Someone who is reading Everything is Miscellaneous suggested I read up on "faceted classification" as if that was something new or different.

What's interesting in Weinberger’s book is (1) recognizing this and (2) taking some concrete action.

What To Do?

What's perhaps the most important thing is this

Stop Forcing Things Into Hierarchies

I sat in a multiple hour meeting where we debated the file-system structure for artifacts created during a development project. Each artifact has several dimensions.
  • Phase of the project (Inception, Elaboration, Construction, Deployment)
  • Deliverable type (DB Design, Application Programming, Web Site, etc.)
  • Status (Work in Progress, Waiting UAT, Completed, Rework, etc.)
  • Calendar (Year, Quarter, Month the work started, as well as ended)
  • Team (DBA's, Batch/Backend, Web/Frontend, ETL, etc.)
Sigh.

Since the data is multidimensional, no single taxonomic hierarchy can ever "work". Each alternative (and there are 5!=120 ways to permute five dimensions) appears equally useful.

If you want, you can enumerate all 5! permutations to see which is more "logical" or "works better for the team". What you'll find is that they all make sense. They all make sense because the dimensions are all peers -- equally meaningful.

Alternatives

One alternative is to do this.

1. Create a relatively flat structure. Define all your things in this flat structure. In a Relational Database context, this means assign surrogate keys to everything, "natural" keys are more problem than solution. In a content management context, just throw documents anywhere.

2. Create "alternative" indices via hard links to the flat structure. Do not limit yourself to a few alternative orderings of the dimensions. There are n! permutations of your dimensions. Expect to create many of these for different user consituencies.

Remember, Search Exists

Recognize that highly structured metadata fields in a database are usually a waste of time and money. Search exists. Much data is unstructured or semi-structured and search functions exist that handle this nicely.

If you stop force-fitting hierarchies, you find that you have now have several dimensions. Each dimension has a set of reasonably well-defined tags. Each document or database fact row is a point in multi-dimensional space.

A single SQL-style query among these multiple dimensions is a pain in the neck. Search, however, where the dimensions are implied instead of stated, is much, much nicer.

Friday, September 11, 2009

Python in the News

See this in Boing Boing: http://www.boingboing.net/2009/09/11/hairy-type.html

They're talking about NodeBox, something I'd never heard of before.

"NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie."

Using Python to direct the creation of graphics. How cool.

Tuesday, September 8, 2009

API Quality Check

A recent request for an API quality check sent me into a paroxysm.

The request seemed simple enough. They had two varieties of API design: varietal M had a lot of methods, each with relatively few parameters. Varietal P had a few methods, but each had a boat-load of parameters.

There had been some "reading" on API design and questions were raised. They wanted me to weigh in, telling them that style M was "better" than style P. [It is, but that's not the point.]

I was shocked speechless.

I find it incredible that anyone could even need coaching in API design. Much less find Tulach's book and still be unable to apply the principles.

Here's what bugged me.

If they were coding in any sensible language, they should have mountains of API examples all around them. Java has a huge standard library. C# has the entire .Net framework. Python has a vast library. All of these are tremendous, well-designed, carefully crafted examples of API's.

API's Everywhere

As far as I can see, the world is fat with API examples. Everywhere you look, every vendor, every product, everything has an API.

It just shouldn't be rocket science to compare your API against the established standard for the language in which you're working.

Somehow it was possible for several programmers to be completely unable to find any examples of API design.

I can only assume that they are living in a time-warp; none of them have ever connected to the Internet or seen any code but their own. Perhaps the only API they could think of was JDBC. Or perhaps they were all Visual Basic or PL/SQL programmers and didn't see much open-source code. Or perhaps they had some really obscure language where no one posts any open source API's.

What To Do?

My direct advice was to read Tulach and create a big spreadsheet ranking their code against each principle that Tulach provides.

After thinking about it, I realize I should have asked what API's they were currently using and how their proposed new code stacked up against the existing language and framework they already had.

Friday, September 4, 2009

RDBMS Issues and Concerns

Check out this blog post: http://cacm.acm.org/browse-by-subject/data-storage-and-retrieval/32212-the-end-of-a-dbms-era-might-be-upon-us/fulltext

The first issue is that the RDBMS code base in ancient. The second issue is that we keep pushing the envelope on the RDBMS model; examples include OLAP and RDF triple-stores.

Some folks want to say "reports of the death of the RDBMS are premature."

Like COBOL, the relational model, and words like "DASD", some technologies will be with us for decades after their useful life.

The decline of COBOL and the Relational Database will be protracted, painful, inevitable and asymptotic with actual death. The old one-size-fits-all COBOL is being replaced by many other languages. Similarly, the one-size-fits-all RDBMS will be fragmented into more specialized data stores. Further, legacy technology never completely goes away.

Macintosh Support

A handy CNET review site: http://reviews.cnet.com/macfixit/