Thursday, May 26, 2011

Code Kata : "Simple" Database Design

Here's a pretty simple set of use cases for a code-kata database application.

This is largely transactional, not analytical.

It's a simple inventory of ingredients, recipes and locations.

Context
  • 42' sailboat.
  • Lots of places to keep stuff. Lots.
Stuff gets lots or misplaced. It's helpful to marry recipes with ingredients to use up the last of something before it goes bad and stinks up the boat.

Actor is essentially the cook.

Use Cases
  • Perishables to be eaten soon?
  • Shopping list for specific recipes.
  • Where did I put that?
Model



  • Ingredient. A generic description: "lime", "coconut". Not too much more is needed. A "food safety" notation (refrigeration required, etc.) is a helpful attribute. Maybe a "food group" or other nutrition information.
  • Location. A text description of where things can be stored. This shouldn't have too many attributes, because boats aren't big grids. Phrases like "port saloon upper cabinet", or "galley outer cooler" make sense to folks who live on the boat.
  • On Hand. This is simply ingredient, location and a measurement of some kind. Example: 3 limes in the starboard galley center cooler. There's a lot of magic around units and unit conversion that can be fun. But that strays outside the database domain.
  • Recipe. Example: "One of sour, two of sweet, three of strong, and four of weak.", lime, simple syrup, rum, water. Plain text using a lightweight markup is what's required here. Along with a many-to-many relationship with ingredients. This is not carefully defined above because it should be done as a "more advanced" exercise.
I think this has the right amount of complexity and isn't very abstract. Since the use cases are pretty obvious to anyone who's cooked or been to a grocery store, use case details aren't essential.

Wednesday, May 25, 2011

Meetup Tonight

Tonight (May 25th). Red Dog. Colley Ave. Ghent. I'll be wearing my Stack Overflow shirt. I'll be there about 7. I know that at least one other person won't be there until 8.

The Meetup link.

I like this meetup idea a lot. Probably because the WFH life-style is a little isolating.

There's the small "Hampton Stack Overflow Community". We have a common interest in Stack Overflow.

Also, there's the 757 Python Users Group. We have a common interest in Python. I've decided to become the "official" organizer for this. I'm going to join the 757 Labs Hackerspace, also.

Tuesday, May 24, 2011

Agility and following a "Strictly Agile" approach

I've seen some discussion on Stack Overflow that is best characterized by the question: "What is Strictly Agile?", or "What's the Official Agile Approach?".

Someone shared this with me recently: "Process kills developer passion".

I have also heard some great complaints about organizations that claim "Agile" and actually do nothing of the kind. In some cases it's not a "crunchy agile shell" around a waterfall process; it's a simple lie. Nothing about the process is Agile except a manager insisting that all the status reporting, planning and unprioritized lists of random requirements are Agile.

Finally, I got this weird suggestion: "consider writing a blog about how to test if you are agile or not". It's weird because testing for Agile is like testing for breathing; it's like testing for flammability.

The Agile Test

Testing if your project is Agile can be done two ways.
  1. Practical. Make a change to the project. Any change. Requirements, architecture, due dates, staff, anything. Does it derail? If so, it wasn't very Agile, was it?
  2. Theoretical. Reread the Agile Manifesto. Make a score card that evaluates the project on each of the eight basic criteria in the Agile manifesto. Convene all the project stakeholders. Conduct careful surveys and have structured walkthroughs to determine the degree of Agility surrounding each person, deliverable, collaborative relationship and issue.
An important point is that Agile is not absolute. Some practices are more Agile than others. There's no "strictly" Agile. There are ways to make a project more Agile; that is, it can effectively cope with change. There are ways to make a project less Agile; that is, change causes problems and can derail the project completely.

The canonical example is a missing, misstated or contradictory requirement that gets uncovered after coding and during user acceptance test. Clearly, that feature has been built and is absolutely wrong. What happens next?

Agile? The product can be released with with the broken feature relegated to the next release. A hack is put in to remove the buttons or menu items or links until they work.

Not Agile? Everyone works around the clock to make that feature work no matter what. Paraphrasing Admiral Farragut: "Technical debt be damned. Development must proceedfull speed ahead." All of this irrespective of the relative value of what's being developed. Schedule comes first; features second.

How Much Process?

The "Process Kills..." blog entry repeats observation that a lot of carefully-defined process isn't really all that helpful. It identifies a cause ("process kills passion") that's can be true, but it's largely irrelevant. Process is—essentially—work that's not focused on delivering anything of real value. Complex processes are "meta" work; it's work focused on IT internals; it's work that creates no value for the users of the software; work that replaces the more valuable elements of the Agile Manifesto.

One can argue that processes, documentation, contracts and plans "assure" success or demonstrate some level of quality. To an extent all the process and meta-work creates trust that—eventually—the resulting software product will solve the original problem.

The mistake is that non-Agile methods use a series of surrogates—processes, documentation, contracts and plans—instead of actual software. The point of Agile methods it to release software early and often and avoid using surrogates.

Key Points of Agile

Here are the key points of the Agile Manifesto.
  • Individuals and interactions over processes and tools. A more Agile project will use the best people and encourage them to talk amongst themselves. A less Agile project will write a lot of things (which folks don't have time or reward for reading.) There will be misunderstandings, leading to large, boring meetings where someone reads powerpoint slides to other folks to try and clear up misunderstandings.
  • Working software over comprehensive documentation. A more Agile project uses frequent release cycles of incremental software. A less Agile project attempts to gather all requirements, do all design and then try to do all the coding even though the requirements have already been found to be less than crystal clear.
  • Customer collaboration over contract negotiation. A more Agile project uses constant contact with customer and product owner to refine and prioritize the requirements. A less Agile project uses a complex change control process to notify everyone of a requirements change, which leads to design and code changes, and has cost and schedule impact that must be carefully planned and documented.
  • Responding to change over following a plan. A more Agile project uses incremental releases, conversation and a modicum of discipline to build things of value. Just because someone thought it should be included in the requirements doesn't mean the feature is really required.
The "Process Kills Passion?" Question

There Process Kills Passion blog lists a bunch of things that—it appears—some folks find burdensome:
  • Doing full TDD, writing your tests before you wrote any implementing code.
  • Requiring some arbitrary percentage of code coverage before check-in.
  • Having full code reviews on all check-ins.
  • Using tools like Coverity to generate code complexity numbers and requiring developers to refactor code that has too high a complexity rating.
  • Generating headlines, stories and tasks.
  • Grooming stories before each sprint.
  • Sitting through planning sessions.
  • Tracking your time to generate burn-down charts for management.
This list has three different collections of practices.
  • Good. TDD, code reviews, generating headlines, stories and tasks, grooming stories before each sprint and doing some planning for each sprint are all simply good ideas. They must be done. "Pure Coding" is not a good way to invest time. Planning and then coding is much smarter, no matter how boring planning appears.
  • Difficult. Test code coverage can be helpful, but can also devolve to empty numerosity. 20% more coverage doesn't not mean 20% fewer bugs. Nor does it mean 20% less chance of uncovering a bug at run time. Code complexity ratings are also fussy because they don't have a direct correlation with much. They must be done and used to prioritize work that will reduce technical debt. But mindless thresholds are for cowards who don't want to mediate deep technical discussions.
  • Silly. Creating burn-down charts for management shouldn't be necessary. Everyone must read and understand the backlog. Everyone should build the summary charts they want from the backlog. The product owner or even the eventual customer should do this on their own. They must be given a profound level of ownership of the features and the process for creating software.
I don't agree that process kills passion. I think there's a fine line between playing with software development and building software of value. I think that valuable software requires some discipline and requires executing a few burdensome tasks (like TDD) that create real value. Assuring 80% or 100% code coverage doesn't always create real value. Spending time keeping the backlog precise and complete is good; spending time making pictures is less good.

Thursday, May 19, 2011

Creating UML

I'm a big fan of plain-text tools. Source Code. ReStructuredText. LaTeX.

I'm not a big fan of proprietary file formats and document formats that are difficult or impossible to decode. JSON and XML rock. .XLS files are painful and difficult to work with.

UML Diagrams are a particularly odious problem. To see a diagram it has to be PNG or PDF or some other graphic format that's optimized for storage and display, but not really optimized for editing. SVG has a text vector markup language, but it's painful because it's so generalized.

Recently, I found two text to UML tools that are exciting prospects.

First, there's YUML.me. This draws pretty nice, if simple, diagrams that you can work with with relatively little pain. It's slow and limited. But it works for simple diagrams.


The best part is that the image is rendered from the URL as plain text.

http://yuml.me/diagram/scruffy/usecase/[Author]-(write text), (render image)-[YUML], [Author]-(share link).

YUML supports simple use case diagrams, simple class diagrams and really simple activity diagrams. It covers a few bases with a pleasant level of flexibility.

The other tool is Plant UML. "PlantUML is used to draw UML diagram, using a simple and human readable text description."

The online Plant UML Server allows a flexible no-software-on-the-desktop way to play with their markup language. The text of the image is not in the URL here, since the text is so much more complex.


The best part of this is that the pictures come from plain text.
  • The plain text is trivial to put under configuration control.
  • Plain text system descriptions are easy to write with simple markup.
  • Plain text documentation of existing software can be derived from simple source analysis.
  • Plain text design documents can generate some elements of the source code

Wednesday, May 18, 2011

The 757 Python User's Group

http://www.meetup.com/757-Python-Users-Group/

I'm looking forward to meeting other Python developers in Hampton Roads.

Tonight. 7:00 PM. See you there.

Tuesday, May 17, 2011

Decisions and Consequences

A single poorly-made decision can have profound ripple-effects. Once your stuck with it, you make accommodations, hacks and work-arounds. Eventually, things work, but the result is less than ideal.

Changing tack requires sometimes pervasive rework to the application. How can we reduce the risks and improve the value created?

A Recent Example

When dealing with bulk econometric data (Bloomberg, D&B, Moody's, etc.) you get BIG files with lots of fields. Depending on what you're paying for, the file layouts are frequently different even though the content is similar. I'm a big fan of plain-old CSV data. Even the tab-delimited variant of CSV is not bad to work with.

Further, most vendors will slap some heading rows on the file so that the column names are--more or less--identified. Surprisingly, this doesn't work out well in practice because there are often multiple columns with the same name. Sigh.

Using Python's csv library module lets us cope with CSV (and tab-delim) quite gracefully.

What's wrong with that decision? Nothing.

Variant Column Names

The question arises when you've purchased several files of econometric data and the column names are slightly different. This happens with a single vendor and across vendors. It's part of the game that can't easily be avoided. Column names vary.

What to do?

Here's the less-than-ideal decision. Make the column names a parameter.

In Python, this is not terribly difficult. The csv module's DictReader provides us a dictionary for each row. Each column name becomes a key. We can access the fields with some_row['this_field'] and some_row['that_field']. How bad can it be?

The extra punctuation is fairly hideous.

More importantly, however, is the nature of the metadata.

Consequence One -- Dynamic Metadata

Dynamic metadata, in this case, means that any indexing of the data is done based on character string column names.
index[index_name][row[column_name]].append( row )
That's rather more complex than the alternative where the metadata has a fixed definition.
some_index[row.column].append( row )
Consequence Two -- Murky ORM

Once we have dynamic metadata, we're largely frozen out of ordinary SQL database implementations. We don't know the column names, we don't know the indices. We can't do simple CREATE TABLE statements because we don't really have the column names until we open the working files.

We have to grub through all the code to find out where the dynamic mapping is reasoned out. Once we find that, we can then consider how to make the metadata fixed enough to tackle a SQL database.

We could, of course, generate the SQL CREATE INDEX statements on-the-fly. There's nothing wrong with it. But it slows down analysis and decision-making when we're not sure what indexes there are or what leads to a choice of index.

What's important here is that we want to use SQLite because it ships with Python. We want our application to use an ORM (like SQLAlchemy or SQLObject). We don't want our application to become a kind of ORM because of the dynamic SQL and dynamic column names.

Cleanup

The cleanup road is clear.
  1. Map all variant inputs to one common structure. Rather than work with raw dictionaries from csv, map each row to a standard set of names. For now, we can replace the dictionaries with named tuples to prepare for a migration to an ORM when that's possible.
  2. Replace the row['some field'] syntax with row.some_field syntax. Of course, there's a lot of this. This is a pervasive change.
  3. Find all the dynamic index creation and refactor that into a more static "database-like" place for now.
Item 1 is pretty easy to unit test. We're adding a function to map from dynamic names to fixed names. Nothing much to this testing-wise.

Item 2 requires unit tests with really good code coverage or there's no earthy way we can be sure that each mapping-syntax name has been transformed into an attribute-syntax name.

Item 3 barely requires testing. Indexes and other features are performance enhancements that can be removed and added without altering functionality.

Thursday, May 12, 2011

A Taxonomy of Use Case Errors

First, the definition. A use case describes an actor's interaction with a system to create business value. There are three parts: Actor, Interaction and Business Value.

1. Not Interactive.
1.1. The use case is just features and technical attributes with no actor interaction expressed.
1.2. The use case is just algorithms and processing with no connection to an actor or a goal.

2. No Business Value.
2.1. Incomplete
2.1.1. The use case focus on sequential operations with no value or goal.
2.1.3. The use case simply follows existing precedent without supporting actual business goals. It "paves the cow path".
2.2. Non-Specific
2.2.1. The use case is a result of free-running imagination; it conflates "possibly" vs. "required". It contains descriptions of interactions which could happen or would be nice to happen.
2.3. Covers the Technology Only
2.3.1. The solution technology is conflated with the business problem. Words like "database" or "foreign key" or "error log" or other solution technology are central.
2.4. Contradictory
2.4.1. The use case goal contradicts other goals.
2.4.2. The use case sequence is inconsistent with the stated goal.

3. No Actor.

Tuesday, May 10, 2011

The Ubiquitous Object

Objects are everywhere.

Weirdly, some people can't see them. I guess they live in a rarified, HP Lovecraftian world of pure action inhabited by amorphous things that can't be properly called "beings" but rather "doings" because they're pure activity with no existence.

Read "Hypnos". "They were sensations, yet within them lay unbelievable elements of time and space—things which at bottom possess no distinct and definite existence."

Got this comment the other day.
... doing procedural code correctly when you don't want to be bothered w/ OO is a separate and big enough topic that warrants its own book or monograph.
I guess that means that objects, and the reality that they model, are a "bother"—a pitfall to be avoided—a cost with no benefit. This is not the first time I've heard this, and—like Lovecraft—it leads me to wonder how such a rich and weird phantasy world gets constructed.

I had a project manager exclaim "You don't need more than seven or eight objects to write any application." I didn't press the person on that point. I assumed that they were talking about classes (not objects) and, further, had conflated class with "elaborate module-like library packed with amazing features". Or maybe they conflated class with package. Or something. It's hard for me to dig into misapprehensions and false assumptions without being rude.

There are a surprising number of misapprehensions. I'm occasionally tempted to turn NTLK loose on all questions tagged "Python" on Stack Overflow. With some patient reading, I think I could develop a taxonomy of OO confusion. However, let's just focus on this comment.

The Bother Factor

Why is OO a "bother"?
  1. I've been told that OO programming is different. Different from what? From procedural programming without objects, I guess.
  2. I'm been told that some problems are a better fit for OO, and some problems aren't a good fit for OO. This is hard to parse because it makes the more profound claim that some problems weirdly don't involve any "objects" just pure actions.
  3. The Object-Relational Impedance Mismatch problem somehow indicts object-oriented programming as unsuitable when there's a relational database involved.
Let's look at some of these in a little depth to see the underlying fallacies.

Procedural Is More Fundamental

This is subtle and pernicious. An OO language contains within it a procedural language. Because of this, we can use Java, C++ or Python to write Fortran-like (or VB-like) crapola code. It's possible to write everything in a single, massive, static class with piles of random global variables, long lists of disorganized methods, and "adaptation via block comment" buffoonery.

Some folks object to characterizing procedural programming as random, disorganized or buffoonery. They tell me that a purely procedural can be neat and well organized with tidy, focused modules that have narrowly-defined responsibilities, no global variables and clever techniques like pointer-to-function to support adaptation.

Wait. The idea of tidy, focused modules with narrowly-defined responsibilities is exactly what a class is.

This is important. All good procedural programming is isomorphic to object-oriented programming minus the class definitions.

Procedural isn't "fundamental". It's just a "fragmentary". Procedural programming is a subset of object-oriented programming. Not a foundation. We can, for example, do functional-style object-oriented programming by using immutable objects.

Some Problems Aren't A Good Fit

Claiming that there are problems which don't fit the object-oriented paradigm is false. Or such a claim hearkens to a more elaborate ontology in which existence somehow doesn't matter.

This question is typical: "What should be OO and what shouldn't?"

When a program "runs" or "executes" there is state change. In a lazy functional world, state change is characterized by the creation and destruction of immutable objects: the new "4" that's created by "2+2".

In order for there to be state, there must be an object that has a state of being. Objects are inherent in doing any computing of any kind.

Some folks like to lift up stored procedures or shell scripts as "important" examples of non-OO programming. Mostly, these just show that a non-OO language can persist for a long time because clever programmers can work around a lot of limitations. (Turing Completeness is a necessary pre-condition; not a desirable feature set.)

[And yes, I've written multiple-thousand line shell scripts so customers can avoid paying a license fee for a proper compiler. Just because it can be done doesn't mean it should be done.]

This is important. All Programming Involves Objects.

There are really just two "paradigm" decisions. Does the problem involve new class definitions or can it be done using built-in classes? Does the problem involve mutable objects or immutable objects?

Software that uses only built-in classes is termed "procedural". Software that uses only immutable objects is termed "functional". Software that uses mutable objects is mistakenly termed "object-oriented".

Object-Relational Mismatch

This isn't really very interesting, no matter how many times people like to flog it. Use an ORM. Move on.

Further, it's important to recognize that normalization, foreign keys, cascading deletes and other malarky are hacks imposed on us by several relational database limitations. These are not essential parts of any problem.

I don't know how many times I've had to answer the "how do I do foreign keys in Java/C++/Python?" question. The answer is always the same: foreign keys are a hack-around because there are no proper object references in a relational database.

What's Left?

In spite of the obvious logic that OO is central, there is always a residual "It's a bother" sense from folks who's first language was not an OO language.

As far as I can tell, the "bother" stems from simple ignorance of what's really going on. Many programmers can't articulate any design principles. Yet, they tend to follow some principles rather closely. Ask them what they're doing. Read their code. Almost everyone who codes has some set of fundamental principles. (The few exceptions are people who seem to write code more-or-less randomly and still manage to arrive at something that appeared to "work"; these people do exist and are very scary.)

Many programmers don't follow all of the SOLID Principles.

Many programmers follow the SOLID principles using different nomenclature. The SOLID initials and acronyms are just one one goofy terminology. There are more principles than these, and the principles can have other names.

What's important is that (except for rare exceptions) all programmers follow some of the SOLID principles. Some follow all of them. Some follow numerous additional principles beyond these. Some give their principles other names.

The folks who claim OO programming is a "bother" just don't happen to recognize that they're already following some of the SOLID principles and actually doing OO programming with built-in classes.

Doing Procedural Programming Correctly

Bottom Line: "doing procedural code correctly" is simply OO programming using only built-in classes.

It's not a "big" topic. It's entirely an exercise in learning how to apply someone else's nomenclature to one's existing principles.

Tuesday, May 3, 2011

The curse of procedural design

After reverse engineering procedural code in C, VB or even Python, I'm finding that procedural programming inevitably leads to bad, bad code-rot.

Consider some of the common design patterns.

Strategy. Confronted with alternative strategy choices, a purely procedural code solution is either
  • If-statements everywhere the strategy is involved.
  • Block comments. (Pre-processor #if statements are the logical equivalent of block comments plus a tool to move them around just prior to compilation.)
These lack flexibility and seem to devolve into a quagmire of mystery. The if-statements often become tangled and complex. More importantly, some strategy choices — which are unused — may not be maintained at all. Of course, the block comments are never maintained.

Command. Often a command design requires a "code" or "label" and a big-old sequential switch (BOSS™) statement to select among the procedures which implement the various commands. Once "composite" commands are introduced, this devolves into nonsense. Ideally, it's a simple recursion, where a composite command simply invokes the sub-commands. However, folks get nervous about recursion and try to write weird loops.

State. A state design always seems to involve labels or codes for the state names and a slightly different big-old-state-switch (BOSS™, no accident that this is the same acronym) to sort out the variant behaviors in the distinct states. This shouldn't become too confusing. After all, Turing machines and other mathematical abstractions give us a strong hint on how we should proceed.

The problem with stateful procedural programming is that the state changes can be hidden everywhere. In the Really Bad Languages, variables can change values without an assignment statement! In the Not Bad Languages, we can track down the various assignment statements and try to reason out the state changes. Procedural code—without a lot of adult supervision—never seems to encapsulate state change with the the same in-your-face clarity that OO programs do.

I Could Go On

The point is this. While procedural programming could be done well, there appear to be a lot of obstacles inherent in the paradigm.

The best procedural programming I've seen has always been very object-oriented. Each procedure or function had a distinct data structure it worked with; they were all closely related by virtue of naming or file structure; much like a class definition.

I'm starting to wonder if my Building Skills books are taking the right approach. I start with the procedural aspects of Python. I'm beginning to feel that this may be a disservice to the n00bz.

Perhaps it's better to swap the order of the sections and start with the various Pythonic data structures and introduce the various statements sort of "casually" as part of demonstrating how a data structure is supposed to be used.