S.Lott-Software Architect: June 2012

Thursday, June 28, 2012

How to Write Crummy Requirements

Here's an object lesson in bad requirements writing.

"Good" is defined as a nice simple and intuitive GUI interface. I would be able to just pick symbol from a pallette and put it somewhere and the software would automatically adjust the spacing.

Some problems.

Noise words. Phrases like "'Good' is defined as" don't provide any meaning. The word "just" and "automatically" are approximately useless. Here is the two-step test for noise words. 1. Remove the word and see if the meaning changed. 2. Put the opposite and see if the meaning changed. If you can't find a simple opposite, it's noise of some kind. Often it's an empty tautology, but sometimes it's platitudinous buzzwords.
Untestable requirements. "nice, simple and intuitive" are unqualified and possible untestable. If it's untestable, then everything meets the criteria (or nothing does.) Elegant. State of the art. Again, apply the reverse test: try "horrid, complex and counter-intuitive" and see if you can find that component. No? Then it's untestable and has no place.
Silliness. "GUI". It's 2012. What non-GUI interfaces are left? Oh right. The GNU/Linux command line. Apply the reverse test: try "non-GUI" and see if you can even locate a product. Can't find the opposite? Don't waste time writing it down.

What's left?

pick symbol from a palette ... the software would ... adjust the spacing.

That's it. That's the requirement. 35 words that mean "Drag-n-Drop equation editing".

I have other issues with requirements this poorly done. One of my standard complaints is that no one has actually talked to actual users about their actual use cases. In this case, I happen to know that the user did provide input.

Which brings up another important suggestion.

Don't listen to the users.

By that I mean "Don't passively listen to the users and blindly write down all the words they use. They're often uninformed." It's important to understand what they're talking about. The best way to do this is to actually do their job briefly. It's also important to provide demos, samples, mock-ups, prototypes or concrete examples. It's 2012. These things are inexpensive nowadays.

In the olden days we used to carefully write down all the users words because it would take months to locate a module, negotiate a contract, take delivery, install, customize, integrate, configure and debug. With that kind of overhead, all we could do was write down the words and hope we had a mutual understanding of the use case. [That's a big reason for Agile methods, BTW: writing down all the user's words and hoping just doesn't work.]

In 2012, you should be able to download, install and play with candidate modules in less time than it takes to write down all the user's words. Often much less time. In some cases, you can install something that works before you can get the users to schedule a meeting.

And that leads to another important suggestion.

Don't fantasize.

Some "Drag-n-Drop" requirements are simple fantasies that ignore the underlying (and complex) semantic issues. In this specific example, equations aren't random piles of mathematical symbols. They're fairly complex and have an important semantic structure. Dragging a ∑ or a √ from a palette will be unsatisfying because the symbol's semantics are essential to how it's placed in the final typeset equation.

I've worked recently with some folks that are starting to look at Hypervideo. This is often unpleasantly difficult to write requirements around because it seems like simple graphic tools would be all that's required. A lesson learned from Hypertext editors (even good ones like XXE) is that "WYSIWYG" doesn't apply to semantically rich markup. There are nesting and association relationships that are no fun to attempt to show visually. At some point, you just want to edit the XML and be done with it.

Math typesetting is has deep semantics. LaTeX captures that semantic richness.

It's often best to use something like LaTeXiT rather than waste time struggling with finding a Drag-n-Drop tool that has appropriate visual cues for the semantics. The textual rules for LaTeX are simple and—most importantly—fit the mathematical meanings nicely. It was invented by mathematicians for mathematicians.

Tuesday, June 26, 2012

MADExpo 2012

http://madexpo.us/

June 27 - 29

Hampton, VA

I hope to see you there.

Thursday, June 21, 2012

QR Code

I suddenly realized that QR Codes are everywhere.

Except my business cards.

http://pypi.python.org/pypi/qrcode/2.0

That should allow me to solve that problem and move on.

Tuesday, June 19, 2012

Dereliction of Duty

Recently started looking into Metadata Registries and UDEF and related semantic technology.

The Wikipedia page lists a bunch of relevant Metadata Registry projects and commercial products. Very nice. Easy to follow the links and determine features and benefits.

However.

A client has IBM Cognos. Is there any easy to to see what kind of Metadata features are part of Cognos?

No. Not really.

I wondered about this marketing gap. Why doesn't IBM (or Oracle, they're pretty bad at this also) provide a tidy list of features?

They're so big (and arrogant) that they don't feel the need to do any marketing?
They're so paranoid that they don't want to have their products reduce to a simple bullet list?
They're sales people are so good that they don't need a web presence to sell their products?
They already have such tight wired-in relationships that they don't need to do any marketing?

Or is it the "growth by acquisition" problem? Since IBM acquired Cognos, they hesitate to commit to a list of features?

Whatever the reason, it's frankly difficult to include IBM products in an easy-to-understand info-graphic comparing alternative products.

Thursday, June 14, 2012

IBM RAMAC Device: 5 MB

Check out this picture. http://www.petapixel.com/2011/12/27/what-5mb-of-storage-looked-like-in-1956/

Random Reminiscing Follows

When I was in college (1974-1978) 64K of RAM was the size of a refrigerator.

By 1982, 64K of RAM was an Apple ][+ fully tricked out with the 16K expansion card.

I vaguely remember working with a "tower" device that was a 5MB disk drive. Think Mac Pro case to hold a disk drive.

By 1985 or so, 128K or RAM was a Macintosh and a 5MB disk drive a big desktop console box. Smaller than a tower. Irritating because it took up so much real-estate when compared with the Mac itself that was designed to take up an 8 x 11 space (not including keyboard or mouse).

Now 5MB is round-off error.

What's Important?

Has computing gotten that much better?

Old folks (like me) reminisce about running large companies on small computers. Now, we can't even get our coffee in the morning without the staggering capabilities of a 32GB iPhone.

Old folks, however, are sometimes mired in bad waterfall development methods and don't understand the value of test-driven development. While the hardware is amazing, the development tools and techniques have improved, also.

Tuesday, June 12, 2012

The Universal Data Element Framework (UDEF)

Okay. This is seriously cool.

The Universal Data Element Framework (UDEF) provides a controlled vocabulary that should be used to seed a project's data model.

See http://www.udef.com/

See http://www.opengroup.org//udef/

We're looking at applying UDEF retroactively to an existing schema. What a pain in the neck!

Step 1. Parse the table names. In our case, they're simply CONTIGUOUSSTRINGSOFCHARS, so we have to work out a quick lexicon and use that to break the names into words. Then we can find the obvious aliases, spelling mistakes and noise words. 'spec', 'quanitity' and 'for' are examples of each.

Step 2. Look up the various words in the UDEF vocabulary to create candidate matches. Since each individual word is matched, each table will have multiple candidate matches to seed the analyst's thinking.

Step 3. Manually pick UDEF standard names or create internal extensions to the standard for the problem domain or enterprise unique features.

Do a similar thing for the column names. In that case, they're CamelCaseWithSomeACRONYMS. This is slightly easier to parse, but not much.

Eventually, we have to apply real human business analyst grey matter to locating standard names which might fit with the host of legacy names.

Here's the column name parser.

def property_word_iter( prop_name ):
    """Find words via case changes.

    -   Lower to upper ends a word.
    -   Upper to lower ends a word. However.
        Sometimes the Upper is an acronym that was all caps.
        A lookahead is required to disambiguate.
    """
    cc_iter= iter(prop_name)
    word=[ next(cc_iter) ]
    for c in cc_iter:
        if c.isdigit():
            yield ''.join(word)
            yield c
            word=[ next(cc_iter) ]
        if word[-1].islower() and c.islower():
            word.append(c)
        elif word[-1].isupper() and c.isupper():
            word.append(c)
        elif word[-1].islower() and c.isupper():
            yield ''.join(word)
            c2= next(cc_iter)
            if c2.isupper():
                word= [c, c2]
            else:
                word= [c.lower(), c2]
        elif word[-1].isupper() and c.islower():
            c0 = word[-1]
            yield ''.join(word[:-1])
            word= [c0.lower(), c]
        else:
            raise Exception( "What? {0!r} {1!r}".format( word[-1], c ) )
    if word:
        yield ''.join(word)

Thursday, June 7, 2012

Stingray Schema-Based File Reader

Just updated the Stingray Reader. There was an egregious error (and a missing test case). I fixed the error, but didn't add a test case to cover the problem.

It's simple laziness. TDD is quite clear on how to tackle this kind of thing. Write the missing test case (which will now fail). Then make the code change.

But the code change was so simple.

Tuesday, June 5, 2012

COBOL Rework

See this article: "The COBOL Brain Drain" in ComputerWorld. This article is very, very well written and repeats a number of common fallacies.

The fallacies lead to expensive, labor-intensive in-house software development and maintenance. Either there's a lot of effort poking at the COBOL code. Or there's a lot of extra code to "wrap" the COBOL code so that it's never touched.

"Migrating large-scale systems built in Cobol is costly and risky." A popular position. But the risks are actually quite small; possibly non-existent. The risks of not converting are usually higher than the risk of conversion.

The perception of the COBOL code is that it's filled with decades of tricky, nuanced legacy details that are hard to fully understand. This is only partially true.

A great deal of the tricky code is simply redundant. COBOL is often written with copy-and-paste programming and blocks of code are simply repeated. It's also important to note that some of the code is no longer exercised in the first place.

Mythical Risk

The "risk" comes from the perceived opacity of the tricky, nuanced legacy details. It doesn't appear to be clear what they mean. How can a project be started when the requirements aren't fully understood?

What appears to be the case in reality is is that this tricky code isn't very interesting. Most COBOL programs don't do much. They can often be summarized in a few sentences and bullet points.

Detailed analysis (641,000 lines of code, 933 programs) reveals that COBOL programs often contain several commingled feature sets.

The actual business rules. These are often easy to find in the code and can also be articulated by key users. The real work is usually quite simple.
A bunch of hackarounds. These are hacks to work around bugs that occur elsewhere in the processing. Sometimes a hackaround introduces additional problems which require yet more hackarounds. All of this can be ignored.
Solutions to COBOL data representation issues. Most of these seem to be "subtype" issues: a flag or indicator is introduced to distinguish between subtypes. Often, these are extensions. A field that has a defined range of values ("A", "C" or "D") has a few instances of "*" to indicated another subclass that was introduced with a non-standard code for peculiar historical reasons.

Once we separate the real code from the hackarounds and the representation issues, we find that most COBOL programs are trivial. About 46% of the lines of code (74% of the distinct programs) involves programs that read one to four files to write a report. In effect, these programs do a simple "relational join" or query. These programs have single-sentence summaries.

The hackaround problem is profound. When looking at COBOL code, there may be endless copy-and-paste IF-statements to handle some issue. There may be whole suites of programs designed to work around some issue with a third-party package. There may be suites of programs to deal with odd product offerings or special customer relationships.

The remaining 26% of the non-trivial programs follow a normal distribution of 1/4 simple, 1/2 moderately complex, and 1/4 regrettably and stupidly complex. That final 5% of the programs will also be a whopping 20% of the lines of code. They are the few big programs that really matter.

Risk Mitigation

The risk mitigation strategy involves several parts.

Data Profiling. The COBOL data may have considerable value. The processing is what we're trying to rewrite into a modern language. A profile of every field of every file will reveal that (a) most of the data is usable and (b) the unusable portion of the data isn't very valuable.
Triage. We can summarize 80% of the code in simple sentences. 46% of the code is single-sentence summaries. 34% of the code has multiple sentence summaries. The remaining 20% requires in-depth analysis because the programs are large; they average of 2400 lines of code each.
Test-Driven Reverse Engineering. Since a 5% of the programs do the real updates, it's important to collect the inputs and outputs of these few programs. This forms a core test suite.
Agile Methods. Find the user stories which really matter. Find the COBOL programs that implement those user stories.

The most important risk reduction strategy is to take an Agile approach. We need to prioritize based on value creation. All COBOL programs are not created equal. Some fraction can simply be ignored. Another fraction produces relatively low value, and can be converted last.

The second most important risk mitigation is to build conversions from legacy COBOL files to the new, preferred databases. The data is preserved.

There's almost no risk in rewriting the 46% of low-complexity COBOL lines of code. This code is trivial. Note that some of this code actually has no business value. Simply ignoring the no-value code also reduces risk. Since we're using live files to drive testing, we can easily prove that our new programs work.

It's not risky to focus on the 20% of high-value COBOL lines of code. This contains most (or all) of the processing that the business needs to have preserved. They can articulate the user stories; it's easy to confirm that the COBOL does what the business needs. It's easy to use live data to drive the reverse engineering.

The remaining 34% of the code base may actually involve a small amount of overlooked complexity. There may be a nuance here that really matters.

This overlooked "nuance" is something that the users didn't articulate, but it showed up in our unit testing. We couldn't reproduce an expected output because we didn't correctly locate all the processing steps. It wasn't in our summary of the 80% of moderate-to-low complexity programs. It wasn't in our detailed analysis of the 20% subset of hyper-complex, large programs.

We've reduced the risk by reducing the volume of code that needs to be examined under the microscope. We've reduced the risk by using live files for parallel testing. We've reduced the risk by collecting user stories.

The remaining risks are ordinary project risks, unrelated to COBOL or legacy data.

The Other Great Lie

Another popular fallacy is this:

"The business wants us to make investments in programming that buys them new revenue. Rewriting an application doesn't buy them any value-add".

The value-add is to create a nimble business. Purging the COBOL has a lot of advantages.

It reduces the total number of lines of code. Reducing costs. Improving time-to-market for an IT solution to a business problem.
It reduces the number of technologies competing for mind-share. Less thinking about the legacy applications is less time wasted solving problems.
It reduces the architectural complexity. If the architecture is a spaghetti-bowl of interconnections between Web and Legacy COBOL and Desktop, then following the spaghetti-like connections is simply a kind of intellectual friction.

The COBOL does not need to be purged all at once through a magical "big-bang" replacement. It needs to be phased out.

Agile techniques need to be applied. A simple backlog of high-value COBOL-based user stories is the place to start. The prioritization of these stories needs to then be clustered around the data files.

Ideally all of the programs which create or update a given file (or related group of files) can be rewritten in a tidy package. The old files can be used for Test-Driven Reverse Engineering. Once the programs have been rewritten, the remaining COBOL legacy applications can continue to operate, using a file created by a non-COBOL application.

Each file (and related cluster of programs) is replaced from high-value down to low-value. Each step creates a more nimble organization.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.