S.Lott-Software Architect

Thursday, October 1, 2009

Agile Project Management

Got this question recently.

"Any suggestions on PM tools that meet the following considerations

1) Planning

2) Estimating

3) Tracking (allowing both PM input and developer input)

4) Reporting

5) Support both Agile and Waterfall projects

6) Releases

7) Bug fixes (probably just another type of backlog)"

Agile PM requires far less planning than you're used to.

1. A "backlog" which is best done on a spreadsheet.

2. Daily standup meetings which last no more than 15 minutes at the absolute longest.

And that's about it.

Let's look at these expectations in some detail. This is important because Agile PM is a wrenching change from waterfall PM.

Planning

There are two levels of detail in planning. The top level is the overall backlog. This is based on the "complete requirements" (hahaha, as if such a thing exists). You have an initial planning effort to decompose the "requirements" into a workable sequence of deliverables and sprints to build those deliverables. Don't over-plan -- things will change. Don't invest 120 man-hours of effort into a plan that the customer will invalidated with their first change request. Just decompose into something workable. Spend only a few days on this.

The most important thing is to prioritize. The backlog must always be kept in priority order. The most important things to do next are at the top of the backlog. At the end of every sprint, you review the priorities and change them so that the next thing you do is the absolutely most valuable thing you can do. At any time, you can stop work, and you have done something of significant value. At any time, you can review the next few sprints and describe precisely how valuable those sprints will be.

The micro level of detail is the next few deliverables. No more than four to six. Don't over-plan. Review the deliverables in the backlog, correcting, expanding, combining and refining as necessary to create something that will be of value. List the sprints to build those deliverables. Try to keep each sprint in the four week range. This is really hard to do at first, but after a while you develop a rhythm based on features to be built and skills of the team. You don't know enough going in, so don't over-plan. After the first few sprints you'll learn a lot about the business problem, the technology and the team.

Estimating

Rule 1: don't. Rule 2: the estimate is merely the burn rate (cost per sprint) times the number of sprints. Each sprint involves the whole team building something that *could* be put into production. A team of 5 with 4 week sprints is a cost of 5*40*4 (800 man-hours).

Each sprint, therefore, has a cost of 800 man-hours. Period. The overall project has S sprints. If the project runs more than a year, stop. Stop. The first year is all you can rationally estimate this way. Future years are just random numbers. 5*40*50 = 10,000 man-hours.

Details don't matter because each customer change will invalidate all of your carefully planned schedules. Just use sprints and simple multiplies. It's *more* accurate since it reflects the actual level of unknowns.

What about "total cost"? First, define "total". When the project starts is the "complete requirements" (hahahaha, as if such a thing actually exists). Then, with each customer change, this changes. Further, half the requirements are merely "nice-to-haves". Since they're merely nice, they're low priority -- at the bottom of the backlog.

Since each sprint creates something deliverable, you can draw a line under any sprint, call it "done" and call that the "total cost". Any sprint. Any. There are as many different total costs as there are sprints, and all of them are right.

Tracking

I don't know what this is. I assume it's "tracking progress of tasks against a plan". Since the tasks are not planned at a low level of detail, there's nothing to "track".

You have a daily stand-up. People commit to do something that day. The next day you find out if they finished or didn't finish. This isn't a "tool thing". It's a conversation. Done in under 15 minutes.

Two things can happen during this brief conversation.

- Things are progressing as hoped. The sprint will include all hoped-for features.

- Things are not progressing as hoped. The sprint may not include some feature, or will include an incomplete implementation. The sprint will never have bugs -- quality is not sacrificial. Features are sacrificial.

There's no management intervention possible. The sprint will have what it will have. Nothing can change that. More people won't help. Technology changes won't help. Design changes won't help. You're mid-sprint. You can only finish the sprint.

AFTER the sprint is over, and you've updated the backlog and fixed the priorities, you might want to consider design changes or technology changes.

Reporting

What? To Whom? Each sprint is a deliverable. The report is "Done".

The backlog is a shared document that the users "own" and you use to assure that the next sprint is the next most important thing to do.

Support both Agile and Waterfall projects

Not possible. Incompatible at a fundamental level. You can't do both with one tool because you don't use tools for Agile projects. You just use spreadsheets.

Releases

Some sprints are release sprints. They're no different (from a management perspective) than development sprints. This is just CM.

Bug fixes

Probably just another type of backlog. Correct.

Monday, September 28, 2009

Duct Tape Programmers

See Joel On Software: The Duct Tape Programmer: he lauds the programmer who gets stuff done with "duct tape and WD-40".

Here's why: "Shipping is a feature. A really important feature. Your product must have it."

Dave Drake sent the link along with the following:

This "speaks of coding for the rest of us, who are not into building castles in the air, but getting the job done. Not that there is anything wrong with better design, cleaner APIs, well-defined modularity to ease the delegation of coding as well as post-delivery maintenance. But damn, I wish I had a nickel for every time I sat in a design meeting where we tried to do something the fancy way, and it broke in the middle of the development cycle, or testing, or even the builds, and always in the demos."

However

There is one set of quotes that falls somewhere on the continuum of wrong, misleading and flamebait.

"And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that."

This -- in my experience -- is wrong. For Joel or the author of the quote (Jamie Zawinski) this may be merely misleading because it was taken out of context.

It's absolutely false the customers won't complain about missing unit tests. When things don't work, customers complain. And one of the surest ways to make things actually work is to write unit tests.

I suppose that genius-level programmers don't need to test. The rest of us, however, need to write unit tests.

Unit Testing Dogma

On Stack Overflow there are some questions that illustrate the value of misinformation on unit testing. On one end, we have Zawinski (and others) who says that Unit Tests don't create enough value. On the other end we have questions that indicate the slavish adherence to some unit test process is essential.

See How to use TDD correctly to implement a numerical method? The author of the question seems to think that TDD means "decompose the problem into very small cases, write one test for each very small test, and then code for just that one case and no others." I don't know where this process came from, but it sounds like far too much work for the value created.

It's unfair to say that unit testing doesn't add value and claim that customers don't see the unit tests. They emphatically do see unit tests when they see software that works. Customers don't see unit tests in detail. They don't see dogmatic process-oriented software development.

When there are no tests, the customer sees shoddy quality. When the process (or the schedule) trumps the feature-set being delivered, the customer sees incomplete or low-quality deliverables.

Conclusion

The original blog post said -- clearly -- that gold-plated technology doesn't create any value.

The blog post also pulled out a quote that said -- incorrectly -- that unit tests doesn't create enough value.

Wednesday, September 23, 2009

Code Kata : Parse USPS ZIP3 table

Situation

The USPS ZIP codes have a multi-part structure. The first three digits are a prefix that defines a sectional center facility.

The USPS table L005 3-Digit ZIP Code Prefix Groups—SCF Sortation maps clusters of ZIP3 prefixes to Facility and State codes. The following URL has this table.

http://pe.usps.gov/text/DMM300/L005.htm

Your Job

Your job is to write a library module that does two things:

1. Read and parses this table.

2. Support ZIP-code lookup (ZIP3, ZIP, ZIP+4) to return SCF and State information.

Some Notes

Finding and parsing the table is often done in Python with components like Beautiful Soup. Equivalents aren't available in all languages. You might want to copy and paste this table into a spreadsheet application, and save it as a CSV file, which is much easier to work with than HTML.

There's a regular format to the ZIP3 ranges that makes parsing them relatively simple.

The SCF names, however, have two different formats. Some have names that begin with SCF. Others have names that don't begin with SCF. Be careful to handle each version correctly.

Code Kata : Merge Changes

The Situation

A co-worker has mistakenly cloned a directory tree rather than link to it. Then they made some number of changes to files in that directory.

Your Job

Your job is to compute a directory-level difference between an official copy and the changes they made. Sadly, you can't trivially rely on using Subversion for this. You're going to have to write your own differ.

The difference report should show the following kinds of information.

1. Baseline files unchanged in the clone.

2. Cloned files which are new and don't exist in the baseline.

3. Cloned files which are changed and newer than the baseline.

You'll need to skip certain directories. They're either working files or are ignored for other reason.

You'll need to skip certain file extensions. Things like .pyc or .class files have date-stamps that don't indicate a real difference.

Ultimately, you'll produce two things.

1. A report to show what was changed.

2. A script that will copy the changes from the clone back into the master directory.

Some Notes

Python has several modules that help with doing directory and file comparison.

In non-Python environments, you may have to rely on system utilities like diff or cmp.

This is best built incrementally, creating the report first. Then handle exceptions. Then do the copy.

Sunday, September 20, 2009

Innovation and Outsourcing

Good stuff in ComputerWorld: Partnerships can Go Too Far.

"Consider vendor innovation. As companies become large and entrenched, they typically become more risk-averse and less creative, often rejecting ideas that challenge conventional wisdom."

This is really only half the story.

First Things First

Programming is hard -- really hard. Read EWD 316, chapter 2. By extension, most of IT is saddled with really, really complex and difficult problems.

"As a result of its extreme power, both the amount of information playing a role in the computations as well as the number of operations performed in the course of a computation, escape our unaided imagination by several orders of magnitude. Due to the limited size of our skull we are absolutely unable to visualize to any appreciable degree of detail what we are going to set in motion, and programming thereby comes an activity facing us with conceptual problems that have risen far, far above the original level of triviality."

Given that IT is hard, it therefore entails either some risk of failure or considerable cost to avoid failure. It also involves an -- often unknown -- amount of learning.

Before writing software, we really do need to learn the language, tools, architecture and components we're going to use. Not a 1-week introduction, but a real project with real quality reviews. Sometimes two projects are required to ferret out mistakes.

Also, before writing software, we really do need to understand the problem. Sadly, many business problems are workarounds to bad software. Leaving us with many alternative solutions that are all equally bad and don't address the root cause problem.

No-Value Features

Programmers will often pursue no-value features that are part of the language, tools, components or architecture. This drives up cost and risk for no value.

Business Short-Sightedness

The compounding problem is a short-sighted business impetus toward delivering something that mostly works as quickly as possible. Often, business folks buy into the no-value features, and overlook the real problem that we're supposed to be solving.

Sigh.

The Result: Stifling

The result of (a) inherent complexity, (b) no-value features and (c) short-sighted buyers is that IT management finds ways to stifle all IT innovation.

In effect, most companies outsource innovation. They hope that their vendors will provide something new, different and helpful. The IT organization isn't allowed to invest in the learning or take the risks necessary to innovate.

The ComputerWorld article points out that some companies then put Preferred Supplier Plans in place which further stifle innovation by narrowing the field of vendors to only the largest and least innovative.

Wednesday, September 16, 2009

Fedora 11 and Python 2.6

Upgraded a VM to Fedora 11 recently.

This -- it turns out -- comes with Python 2.6 installed.

It is, however, an incomplete build. To do anything, I had to install a some additional Python packages. Specifically, the "libraries and header files needed for Python development". Also, IIRC, the tkinter package isn't present by default.

Once I had the development package installed, I could add setuptools. After that, it's a sequence of easy_install steps and we were up and running.

I've started running our unit test suite with python -3 to capture all of the DeprecationWarnings. So far, there aren't many and they aren't show-stoppers. In one project we have some has_key methods and a use of urllib that needs to be replaced.

It's very, very nice to have a short, specific list of Python 3 compatibility issues to look out for. We're not going to use Python 3 any time soon, but it's nice to be able to solve the problems in advance.

Tuesday, September 15, 2009

The world is multidimensional? Really?

I cannot believe that people still consider top-down, uni-dimensional, taxonomic hierarchies useful.

This Stack Overflow question (REST: How to Create a Resource That Depends on Three or More Resources of Different Types?) repeats an assumption. Essentially the confusion comes from assuming that "URI's map directly to a hierarchy".

I think it's over-exposure to the Windows file system where hard links are a rarity.

Perhaps it's also from over-exposure to hierarchical site-maps that simply repeat the menu structure without adding information.

Someone who is reading Everything is Miscellaneous suggested I read up on "faceted classification" as if that was something new or different.

What's interesting in Weinberger’s book is (1) recognizing this and (2) taking some concrete action.

What To Do?

What's perhaps the most important thing is this

Stop Forcing Things Into Hierarchies

I sat in a multiple hour meeting where we debated the file-system structure for artifacts created during a development project. Each artifact has several dimensions.

Phase of the project (Inception, Elaboration, Construction, Deployment)
Deliverable type (DB Design, Application Programming, Web Site, etc.)
Status (Work in Progress, Waiting UAT, Completed, Rework, etc.)
Calendar (Year, Quarter, Month the work started, as well as ended)
Team (DBA's, Batch/Backend, Web/Frontend, ETL, etc.)

Sigh.

Since the data is multidimensional, no single taxonomic hierarchy can ever "work". Each alternative (and there are 5!=120 ways to permute five dimensions) appears equally useful.

If you want, you can enumerate all 5! permutations to see which is more "logical" or "works better for the team". What you'll find is that they all make sense. They all make sense because the dimensions are all peers -- equally meaningful.

Alternatives

One alternative is to do this.

1. Create a relatively flat structure. Define all your things in this flat structure. In a Relational Database context, this means assign surrogate keys to everything, "natural" keys are more problem than solution. In a content management context, just throw documents anywhere.

2. Create "alternative" indices via hard links to the flat structure. Do not limit yourself to a few alternative orderings of the dimensions. There are n! permutations of your dimensions. Expect to create many of these for different user consituencies.

Remember, Search Exists

Recognize that highly structured metadata fields in a database are usually a waste of time and money. Search exists. Much data is unstructured or semi-structured and search functions exist that handle this nicely.

If you stop force-fitting hierarchies, you find that you have now have several dimensions. Each dimension has a set of reasonably well-defined tags. Each document or database fact row is a point in multi-dimensional space.

A single SQL-style query among these multiple dimensions is a pain in the neck. Search, however, where the dimensions are implied instead of stated, is much, much nicer.