Bio and Publications

Monday, June 29, 2009

It used to work... Now they've fixed it.

Apple's Time Capsule rocks.

For a while the Airport Extreme Base Station (AEBS) -- with a disk added -- would do the same thing. You got Time Capsule backups seamlessly and continuously.

No more. Version 7.4.2. fixed the "bug". Now the AEBS no longer backs up to a disk.

They've been saying that since Nov '08. See article HT2038. In spite of that, a lot of folks saw it working. Try this Google Search.

But it worked for me up until this past weekend. The "upgrade" broke it. Damn.

Thursday, June 25, 2009

Architecture? We don't need no stinking architecture! (Update)

Context: We're reverse engineering some bad VB application code.

What I saw. "This problem report pushed this module over the 64k limit for modules. Therefore all code used for XYZ has been removed from this module and placed in the new bas module XYZ."

What I learned. Good design -- irrelevant. Abstraction -- irrelevant. Layered architecture -- irrelevant. Conceptual Integrity of the design -- irrelevant. What actually mattered was VB's 64k module limit.

Consequence. Reverse engineering will be hard because I don't know what code is scattered around the rest of the code base. The number of global variables is truly awe-inspiring.



Edit: Two additional gems: "Needed to split out another portion of this procedure due to "not enough memory" error on compile" and "This new sub added because of "Procedure too Large" compile error".

Some people should find jobs in a different industry. The module is 6000 lines of code, and apparently, it had no structure at all until it stopped compiling.


Wednesday, June 24, 2009

Semantic Markup -- RST vs. XML

I have very mixed feelings about XML's usability.

An avowed goal of the inventors of XML was "XML documents should be human-legible and reasonably clear." While I like to think that "legible" means usable, I'm feeling that legibility is really a minimal standard; I think it's a polite way of saying "viewable with any text editor."

I've got some content (my Building Skills books) that I've edited with a number of tools. As I've changed tools, I've come to really understand what semantic markup means.

Once Upon A Time

When I started -- back in '00 or '01 -- I was taking notes on Python using BBEdit and other text-editor tools. That doesn't really count.

The first drafts of the Python book were written using AppleWorks; the predecessor to Apple's iWork Pages product. Any Mac text editor is a joy to use. Except, of course, that AppleWorks semantic markup wasn't the easiest thing to use. It was little more than the visual styles with meaningful names.

Then I converted the whole thing to XML.

DocBook Semantic Markup

The DocBook XML-based markup seemed to be the best choice for what I was doing. It was reasonably technically focused, and provided a degree of structure and formality.

To convert from AppleWorks, I exported the entire thing as text and then used the LEO Outlining Editor to painstakingly -- manually -- rework it into XML.

At this point, the XML tags were a visible part of the document, and editing the document means touching the tags. Not the easiest thing to do.

I switched to XMLmind's XXE. This was nice -- in a way. I didn't have to see the XML tags, but I was heavily constrained by the clunky way they handle the XML document structure. Double-clicking a word can lead to ambiguity on which level of tag you wanted to talk about.

The XML was "invisble" but the many-layered hierarchical structure was very much in my face.

RST Semantic Markup

After becoming a heavy user of Sphinx, I realized that I might be able to simplify my life by switching from XML to RST.

There are a number of gains when moving to RST.
  1. The document is simpler. It's approximately plain text, with a number of simple constraints.
  2. Editing is easier because the markup is both explicit and simple.
  3. The tooling is simpler. Sphinx pretty much does what I want with respect to publication.
There is just one big loss: semantic markup. DocBook documents are full of <acronym>TLA</acronym> to provide some meaningful classification behind the various words. It's relatively easy to replace these with RST's Interpreted Text Roles. The revised markup is :acronym:`TLA`.

The smaller, less relevant loss, is the inability to nest inline markup. I used nested markup to provide detailed <function><parameter>a</parameter></function> kind of descriptions. I think :code:`function(x)` is just as meaningful when it comes to analyzing and manipulating the XML with automated tools.

The Complete Set of Roles

I haven't finished the XML -> Sphinx transformation. However, I do have a list of roles that I'm working with.

Here's the list of literal conversions. Some of these have obvious Sphinx/RST replacements. Some don't. I haven't defined CSS markup styles for all of these -- but I could. Instead, I used the existing roles for presentation.

.. role:: parameter(literal)
.. role:: replaceable(literal)
.. role:: function(literal)
.. role:: exceptionname(literal)
.. role:: classname(literal)
.. role:: methodname(literal)
.. role:: varname(literal)
.. role:: envar(literal)
.. role:: filename(literal)
.. role:: code(literal)

.. role:: prompt(literal)
.. role:: userinput(literal)
.. role:: computeroutput(literal)

.. role:: guimenu(strong)
.. role:: guisubmenu(strong)
.. role:: guimenuitem(strong)
.. role:: guibutton(strong)
.. role:: guilabel(strong)
.. role:: keycap(strong)

.. role:: application(strong)
.. role:: command(strong)
.. role:: productname(strong)

.. role:: firstterm(emphasis)
.. role:: foreignphrase(emphasis)
.. role:: attribution
.. role:: abbrev

The next big step is to handle roles that are more than a simple style difference. My benchmark is the :trademark: role.

Adding A Role

Here's what you do to add semantic markup role to your document processing tool stack.

First, write a small module to define the role.

Second, update Sphinx's conf.py to name your module. It goes in the extensions list.

Here's my module to define the trademark role.

import docutils.nodes
from docutils.parsers.rst import roles

def trademark_role(role, rawtext, text, lineno, inliner,
options={}, content=[]):
"""Build text followed by inline substitution '|trade|'
"""
roles.set_classes(options)
word= docutils.nodes.Text( text, rawtext )
symbol= docutils.nodes.substitution_reference( '|trade|', 'trade', refname='trade' )
return [word,symbol], []

def setup( app ):
app.add_role( "trademark", trademark_role )

Here's the tweak I made to my conf.py

import sys, os
project=os.path.join( "")
sys.path.append("/Users/slott/Documents/Writing/NonProg2.5/source")
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.ifconfig', 'docbook_roles' ]

That's it. Now I have semantic markup that produces additional text (in this case the TM symbol). I don't think there are too many more examples like this. I'm still weeks away from finishing the conversion (and validating all the code samples again.)

But I think I've preserved the semantic content of my document in a simpler, easier to use set of tools.

Saturday, June 20, 2009

Failure To Grasp Polymorphism

I've cataloged a third specific case of fundamental failures to understand polymorphism. The first two I've seen a fair number of times. The third seems to be more rare.

1. "How do I determine which subclass an object has?" The Identification problem.

2. "How do I morph an object to a different subclass?" The Transmutation problem.

3. "I can do that with delegation, I don't need subclasses." The Denial problem.

Identification

The Identification problem is the most common. There are two variants: People ask about class comparisons, and people who use some other value as a surrogate class comparison. Either way, they have if statements scattered around the code.

Bad.

if someObject.__class__ == ThisClass:
someObject.this_foo_method()
elif someObject.__class__ == ThatClass:
someObject.that_foo_method()

Worse.

if someOtherIndicator == "this":
someObject.this_foo_method()
elif someOtherIndicator == "that":
someObject.that_foo_method()

Better. Use inheritance. Override one method, don't provide two.

someObject.foo_method()

Transmutation

This is more subtle because there's no easy "wrong" implementation. Instead of bad code, you have goofy questions.

For example:
Both of these are attempts to "dynamically" transmute an object from one class into another.
There are two variants: people ask about having the superclass morph into a subclass, or people want to make a class change so that the object's behavior changes.

In the morph case, they've overlooked the essential truth of inheritance. Every subclass object is an instance of the superclass, too. If you think you want to transmute from superclass down to subclass, that's silly because the subclass object already is an instance of the superclass. By definition. If you think you want to morph, you really want some kind of Factory that spits out proper subclass instances.

In the state-change case, they've overlooked the power of delegation and the Strategy pattern. If you think you want to use a class change, you really want to plug in a different strategy object.

Denial


The example is great. It proves that you don't need inheritance. Sadly, the proof only works if you're overriding every method. If you don't want to override every method, then inheritance suddenly becomes useful.

The denial problem (all delegation, no inheritance) is a kind of opposite to the transmutation problem (all inheritance, no delegation).

Friday, June 19, 2009

The First Number Sticks Forever

Two months ago, we looked at some Data Warehouse design information.

It looked like 8 months of work. It might be finished by year-end. Fatal mistake: we gave a "number". Year-end.

We did due diligence, investigating source applications, data marts, subject areas, etc. And, the client delayed their decision-making process.

After the investigation, we created a detailed estimating model. We didn't create a waterfall schedule. Instead, we defined a typical release and sprint structure and a backlog.

The Unacceptable Revision

We wound up with 9 months of work, beginning next month.

Our sales person was appalled -- shocked! -- that we could no longer make year-end.

"Duh," we said. "It's a month longer, starting three months later. What do you want?"

"We can't tell the customer that," the sales person said.

Sigh.

Saturday, June 13, 2009

How to Derail Use Case Analysis: Focus on the Processes

It's easy to prevent successful use case analysis: make it into an exercise of defining lots of "processes" in excruciating detail.

First, ignore all "objects" definition.  All business domain entities -- and actors -- must be treating as second class artifacts.

Second, define everything as a process.  A domain entity is just some stuff that must be mapped between processes.  Act like the entity doesn't really have independent existence.

Symptoms

You may be trying to do use case analysis, but if you have these symptoms, it might be time to step away from the process flows and ask what you're really doing.

There Are No Actors.  Well, actually, there's one actor: "user".  When all of your use cases have one actor, you've forgotten the users and their goals.  Stop writing the processes and take a step back.  Who are the users?  What are they trying to accomplish?  Where is their data?  When is it available?  What interactions with a system would make them happier and more productive?

Every Action Defines A New Class of Actors.  You have actors like content creators, content updater, content quality assurance, content refinement, content link checking, do this and do that.  Too many actors is easy to spot because the attributes and behaviors of all those actors are essentially identical.  In this example, they all edit content.

Each Use Case is a Wizard.  If each use case is a strictly sequential input of a data element followed by "click next to continue", you've taken away the actor's obligation to make decisions and take action on those decisions.  If you're lucky, you've got a use case for each individual goal the actor has.  More typically, you've overlooked a fair number of the actor's goals in your zeal of automating every step of one goal. 

You Need an "Overall Flow" or Sequence for the Use Cases.   If your use cases have to be exercised in one -- and only one -- order, you've taken away the actor's goals

Collaboration

Use Case analysis describes the collaboration between actors and a system to create something of value.  If the system is described by wizards or modal dialogs that completely constrain the conversation to one where the system asks the actor for information, something's terribly wrong.

The point is to describe the system as a series of "interfaces", each of which has a use case.  The actors interact with the system through those interfaces.    The actor is free to gather information from the system, make decisions, and take action via the system.

War Story

The users had a legacy "application" that was a pile of SAS code that did some processing on the source data before reporting.

The use cases were -- essentially -- "1.  Actor runs this program  2. System does all this stuff."  The "all this stuff" was usually a lengthy, complex reverse engineering exercise trying to discern what the SAS code did.  

No mention of the business value.  No reason why.  And no room to implement a better process.

War Story

Analyst is pretty sure the user wants collaborative editing.  The analyst has a pretty good "epic" (not a proper user story, but a summary of a number of user stories) that describes creating, modifying and extracting from a collaboratively edited document.

The initial discussion lead to every single verb somehow defining a separate actor.  In the original epic, there were exactly two actors, one who added or elided certain details for the benefit of another.

Later discussions lead to a single "User" actor and the craziest patchwork of use cases.  Random "might be nice to have"s crept in to the analysis, and the original "epic" was dropped.  No trace of it remained, making it very difficult to determine priorities.

War Story

Users had developed a complex work-around because they didn't have all the equipment they needed in their local office.  It involved mailing CD's from one office to another to prevent network bandwidth problems.  The business analysts wanted to capture this process, even though parts of it created no value.

It took a fair amount of work to get the analysts to stop documenting implementation details (mailing addresses, Fedex account numbers) and start documenting interactions and the business value that was created.  

Many process steps are physical moves and don't involve making information available for decision-making.  Those no-decision physical move steps should not be described in a use case.  Perhaps in an appendix, but their incidental because they're just the current implementation.  A use case should have the essence of the business value and how the actor uses the system to create that value.

Wednesday, June 10, 2009

Agile Methods, Inversion of Control, Emergent Behavior

I've run in to some Agility questions recently.  Questions that indicate that some people just don't like the Inversion of Control aspect of Agile methods.

We used to call IoC "Emergent Behavior".  The system isn't designed from top-down to fill specific use cases.  Instead, the system is designed so that the interaction of various objects will fill the use cases.  Overall control does not reside in one place.

An Agile project is the same phenomenon.  We're not going to plan the entire effort.  Instead, we're going to do some things that -- in the long run -- will lead to more useful software.

Agile Question 1

"Why focus on a few use cases up front?  If we do that, then new requirements will arrive as we develop, leading to endless rework.  Why can't we enumerate all use cases now?"

Right and Wrong.  Right: we will do endless rework.  Wrong: we will deliver something that works before starting the rework cycle.  

For some reason, focus on a use case is really hard.  Some people feel that they can't build "just enough" software, but must completely understand every nuance before they can do anything.

I think this is a paralyzing fear of failure, coupled with bad experiences from management that equated all rework with failure.

The Agile approach of "build something now" is trumped by their personal failure/rework issues, leading to bizarre designs that include lots of things that aren't in the use case under construction.  It leads to lots of "why are you doing this?" conversations with lots of "it might be needed in the future."

It isn't needed now.  Let it go.  Merely having thought of it, and leaving a stub in the design, is enough for now.  When faced with "attribute vs. property vs. method" questions, those future considerations can help steer you to one or the other.  But don't give in to designing and building the future.  Just leave space for it.

An Agile approach is about an emergent behavior.  It's built from the edges in.  There's an inversion of control here.

Agile Question 2

"Can't you just add a button that says X?  You're supposed to be Agile, why can't you just add this button to the page?"  

First, we're not done with what you asked for two weeks ago.  Until that's done and approved, we're not on speaking terms.

But, more importantly, "adding a button" isn't part of any existing use case.  You're not changing priorities with this request, you're making stuff up.  

Making stuff up isn't bad, per se.  Making up a random piece of behavior, with no actor, no goal, and no business value is bad.  Who will click that button?  What will the business purpose be?  What result will help that person make a decision and take action?

"It's just to show a customer."  Good start.  What's the customer's role?  What do they do?  Are we showing the customer's sales folks how they use this application?  Are we showing the customer's finance folks how they use this application?  Are we showing the operational folks?  Are we showing the underwriting folks?  In short, "who's the actor?"

An Agile approach is about building software someone can use.  Without a use case, we're just building software haphazardly.  A use case isn't an elaborate document, it's just an actor with a goal who interacts with the system to create something of value.  Four simple clauses.

From the use case, we can work out an implementation.  There is no "inversion of control" when moving from requirements to design.   The requirements do not emerge from the design.

Monday, June 8, 2009

A "Don't Break the Build" Tip for Solo Python Developers

One of the Agile practices is Continuous Integration.  Fowler suggests that everyone commits every day.  In Elssamadisy's book includes specific advice on why a daily check-in helps.

Some folks call this the "Don't Break the Build" practice.

But what does that mean for Python where there is no build?  And what does it mean for a solo developer where there aren't any consequences?

The No-Build Build

The C++, Java, C# folks all have a really important, multi-step daily build.  The code has to compile; it has to be packaged into JAR's (or DLL's or whatever).  Perhaps higher-level packages like WAR's or EAR's need to be built.  Then you can run unit tests.

We Python folks don't have anything between code and unit test -- there's no real packaging.  This makes the daily build practice seem a little silly.

However, the daily "commit and run all the tests" is perhaps more important in Python than it is in Java (or C++ or C#.)  Even without any actual build activity, the daily build is still an essential practice.

Things Go Wrong

In Python, you've got two fundamental things which a daily check-in will spot.
  1. Bugs.  All of the logic errors that a daily unit test will spot.
  2. Bad Refactoring.  This is more subtle.  Not all refactoring errors lead directly to a bug that you can detect.  Indeed, there are a significant refactoring problem that I fight with weekly.
No Sense of Commitment

Refactoring is central to Agile development.  It is inevitable that you realize that you've misnamed, misplaced or overused some module or package and need to either rename it or delete it.

In Python, you've got to use `grep` (or something similar) to check your application for a clean change in names.  And you've got to double-check by using SVN to delete or rename the module.

Adding a new module, however, is more subtle.  Adding a new module is easy and quick.  You write it, you use it, you unit test and you're good to go.

Except, of course, if you forget to check it into SVN.  If it's not in SVN, it will still pass all your local unit tests.  It's those "daily build" unit tests that will break on a missing module.

VM To The Rescue

Solo developers, of course, have trouble with the nightly build.   First, they can skip it.  Second, and more important for folks saddled with Windows, you don't often have a clean QA user separate from you, the developer.

A VM is a very, very nice thing to have.  You fire up VMWare (or similar player) and run your daily build in a separate machine.  For a solo developer, you can do the following:
  1. Make changes, unit test.
  2. Commit the changes.
  3. Fire up the VM.  Do an SVN UP.  Run the unit tests again.
When a Python app crashes and burns on the VM, 80% of the time, it's a missing commit.  The rest of the time it's a failed configuration change for any differences between development and QA.

Now you can -- confidently -- turn code over to a sysadmin, knowing that it actually will work.

Thursday, June 4, 2009

Devastating Design Changes -- An Agile Methods Story

We have a design, we have code and we have tests that all pass.

Tuesday, we got some new input data that just wouldn't work.

What -- if anything -- went wrong?

Agile is as Agile Does

We're following an Agile approach for several reasons.
  1. I'm too lazy to draw up an elaborate project plan full of lies ("assumptions").
  2. Our requirements were two versions of a powerpoint slide  that showed one use case at the tail-end of a long information life-cycle.
  3. Outside the one slide, we had no concrete actors or use cases.  We had some clue what we were doing, but it involved inventing new business models for customers -- a challenging thing to "automate".
The Agile approach is that we pick a use case, build some stuff, and put it into production.

One consequence of this is rapid response to requirements changes.  Another consequence is fundamental changes to the design.  A small change to a use case could lead to devastating design changes.

Learning is Fundamental

Since we didn't have all the requirements (indeed, we barely  had any,) we knew we'd be learning as we went.  Tuesday's data drop was one example.

We have a nice library to handle many of the vagaries of the Spreadsheet-As-User-Interface (SAUI™) problem.  We use xlrd and csv modules to handle basic spreadsheet file formats.  (We have the ElementTree parser standing by to handle xml, if  necessary.)  We use the rest of the Python archiving packages to handle ZIP files of spreadsheets.

We've broken spreadsheet processing down into layers.
  • Data Source.  All of our various sources offer methods to step through the sheets and rows.  This minimizes the various file format differences.  Note that CSV provides cells that are text, where xlrd provides cells in a variety of data types.  We have a Cell class hierarchy to implement all the conversions required.
  • Operation.  Each operation (validate, load, delete, etc.) is a subclass of a common Operation.  This operation is given a sheet and processes the rows of that sheet.  It doesn't know anything about the Data Source.
  • Builders.  Each row, generally, builds some model object which is either validated or validated and persisted in the database.  The builder handles the mapping from spreadsheet column to DB column, along with data type conversions.
Sadly, we left something out.

The Devastating Change

We had no use cases, so we were making things up as we went along.  We'd made an implicit assumption in our sheet operations.  All the data we'd been loading was polluted with rows we had to ignore.  So we tossed a quick-and-dirty little if-statement down inside one of the sheet operations.

The new data had slightly different rules for rows we were supposed to ignore.  The quick-and-dirty little if-statement broke the loads.

We have to refactor our sheet operations to hoist out this if-statement.  We have to use the Strategy pattern to replace the statement with a formal appeal to a Filter object that implements the decision.

What If Analysis

The Cost Of Learning (COL™) was two days.  Half of one day to find the problem.  Half of another to reason out the root cause and determine a solution.  Finally, a full day to code and test the revisions.

Yes, it took two full days of effort (spread over three calendar days) to figure out what was wrong.

What if we had tried a waterfall design?  Would we have found, designed and resolved this problem in two days?  No earthly way.  It would have taken two days of brainstorming to think of the use case.  It would have taken a week of hand-wringing to work out a be-all-and-do-all processing pipeline for spreadsheet data -- one that included dynamic filtering.

Instead, we built a processing pipeline that worked.  Now we're expanding that processing pipeline to add a feature.

Tuesday, June 2, 2009

Think Once -- Code Twice

Some thoughts for the day
  • "Quick And Dirty == Guaranteed Rework"
  • "He Who Codes First Loses"
  • "Think Once -- Code Twice"
  • "Admin's Law: It's Always Permissions"
  • "Programmer's Law: If it's not permissions, it's the path"
  • "If it seems hard, you're doing it wrong"
  • "One-Off == The First of Many"
  • "Requirements Translation: Never == Rarely, Always == Mostly"
  • "Things Change: Generalize and Parameterize"