Wednesday, March 31, 2010

Programming in the Large -- Multicore Goodness

The lowly shell (bash, zsh, csh, the whole bunch) is usually a dreadful programming environment. Perfectly awful. With some care, you can easily architect applications so that you don't really need the shell for very much.

However, there is a precious nugget of goodness within the shell's programming language. The Linux shell's have a cool Programming in the Large (PITL) language. This combines executable programs using a number of operators. These operators are an excellent set of design patterns that can help us create complex multi-processing pipelines.

The best part about the shell's PITL language is that a simple shell pipeline will use every core in our processor, maximizing throughput and minimizing the amount of programming we have to do.

PITL Objects

This PITL language has a simple set of operators. If your programs are well-behaved, the language is, in a formal mathematical sense, closed. You can apply PITL operators to combinations of programs to get new composite programs.

To be well-behaved a program must read from standard in and write to standard out. The inputs and outputs must be in some regular syntax. Regular, here, means parseable by regular expressions or regular grammars.

As a special case, we need to create a special program that can read from someplace other than standard in, but write it's content to standard out. A program like cat.

Note that any map-reduce step will be well-behaved. To seed the map-reduce pipeline we use cat as the "head-0f-the-pipeline".

PITL Operators

We'll look at the composition operators using three short-hand commands: p1, p2 and p3. Each of these is "well-behaved": they read from stdin and write to stdout.

Typically, running a program from the shell involves a much longer and more involved command-line, but we'll use these three aliases to strip away the details and look at the design patterns. You can imagine them as being p1.py or even python p1.py.

Sequence, ;. A sequence of steps is shown in the shell on multiple lines, or with the ; operator. In effect a sequence declares a program as the precondition for the following program. We can summarize this as "p1 ; p2".

Parallel, &. A parallel operation is shown by using the & operator. The two programs are declared as independent operations. We can summarize this as "p1 & p2". As an extension to this, a trailing "&" allows the programs to run in parallel with the shell itself; this gives you a next prompt right away.

This allows the OS to schedule your two processes on two or more cores. However, there's no real relationship between the processes.

Pipeline, |. A pipeline operation is shown using the | operator. We can summarize this as "p1 | p2". In addition to the logical connection of one program's input being the other program's output, both programs can run in parallel, also.

This allows the OS to schedule your two processes on two or more cores. Indeed, the more stages in the pipeline, the more cores you'll need to do the processing. Best, of course, the I/O is through a shared buffer and doesn't involve any physical transfer of bytes among the processes.

This is a very powerful way to use multiple cores with minimal programming.

If one part of a pipeline is a sort, however, the parallel processing is limited. The sort must read all input before providing any output. A process like "p1 | sort | p3" is effectively serial: "p1 > temp1; sort temp1 >temp2; p3 temp2".

Grouping. Programs are grouped by ()'s of various kinds ({} and ``). Also the conditional and repetitive statements effectively group series of programs. We use syntax like "( p1 & p2 ); p3" to show the situation where p1 and p2 must both complete before p3 can begin processing.

Using All the Cores

Most importantly, something like "( p1 ; p2 ) | p3" directs the output of two programs into a third for further processing. And the two program sequence runs concurrently with that third program. This will use at least two cores.

What we'd also like is "( p1 & p2 ) | p3", but this doesn't work as well as we might hope. The output from p1 and p2 are not a stream of atomic writes carefully interleaved. They are non-atomic buffer copies that are impossible to disentangle. Sadly, this can't easily be implemented.

Other Features

The shell offers a few other composition operations, but as we start using these, we find that the shell isn't a very effective programming environment. While the shell pipeline notation is outstandingly cool, other parts of the notation are weak.

Conditional. The if, case and select shell statements define conditional processing and groupings for programs. Trying to evaluate expressions is where this gets dicey and needlessly complex.

Repetitive. The for, while and until shell statements define repetitive processing for a program. Again, expression evaluation is crummy. The for statement is usable without needless complication.

Four of these PITL operators (sequence, parallel, pipeline, grouping) give us a hint as to how we can proceed to design large-scale applications that will use every core we own.

Implementation Hints

You can -- trivially -- use all your cores simply by using the shell appropriately. Use the shell's pipeline features and nothing else, and you'll use every core you own.

For everything outside the pipelining features, use Python or something more civilized.

And, you have a nice hybrid solution: iterpipes. You can construct pleasant, simple, "use-all-the-cores" pipelines directly in Python.

Monday, March 29, 2010

Dumb Info Security

A truly great question came up the other day.
"Why change passwords every 90 days? What is the threat scenario countered by that policy?"
Of course strong password policy means constantly changing passwords. Right?

Then I started to think about it. What -- actually -- does a password change protect you against?

The answer, it appears, is nothing. Changing passwords is largely a waste of time and money. I suppose that a password change prevents further abuse of the account. But generally, the abuse is not ongoing. Once in to a system, the trick is to create an additional privileged account that does not belong to any real user; all the password changes in the world have no effect.

This post is spot-on: "Password rules: Change them every 25 years"

In short, there's no threat that's actually countered by changing passwords. However, it's on everyone's checklist.

[Look at http://passcracking.com/hybrid.html for information on rainbow table attacks. The time required is on the order of 10 minutes.]

Since a weak password is broken in well under 90 days, there's no "moving target" to this. A weak password is -- effectively -- broken instantly compared to the 90-day password change. Once broken, the machine's freely available for -- on average -- 45 days.

The comments on this post are helpful also. Most people agree that password changes do not have any possible impact on security. Except that it gives security managers a chance to improve the rules and enforce everyone to change their passwords to meet the new rules.

Missing the Point

One comment that's interesting is this:
You've made two assumptions: 1) all password thieves will give up after a few tries in the case of brute-force attack, and 2) all thieves will give up after a few tries in the case of dictionary attacks.
This misses the point entirely. These two assumptions are not overlooked by this posting. They're not part of it at all. None of this is based on password thieves giving up.

Changing a password does not materially impact the thieves' ability to crack a password. Phishing, and Key Logging always work, no matter how often the password is changed.

A dictionary attack is trivially defeated by disabling the account after a few failures. Changing the password is of no relevance at all.

A rainbow table to undo a hashed password is defeated by using long salt strings with the hash. Changing passwords every 90 days has nothing to do with this, either. There's no "moving target" concept, since a rainbow table attack takes much less than 90 days.

Literate Programming Life Cycle

The question is a deep one. What is the Literate Programming Life Cycle? Why are is it so difficult? What are the three barriers and how do we cross them?

Here's most of the original question.
"Last week I threw together an F# script to parse markdown-style text into one or more F# files.

"The thing is, nearly all the references I can find online talk about the finished article, but not the design process. Obviously for my first attempt, I necessarily had to start out by writing the F#, then writing the document with embedded code afterwards. But now I’ve got that working, I have difficulty working out how the ongoing development process actually works. Currently only having a text editor with no colour coding, then having to ‘compile’ my markdown to code, then compile my code to test it, all seems like too much hard work, and the temptation is just hack the code directly.

"Given that I imagine the python development process is similar to F#, I wondered what your experience is with the hack/test/finalise development cycle."
Some Background


Also, this quote from the discussion on Lambda the Ulimate.
"The issue of literate programming is an issue of writing a program
that LIVES rather than writing a program that WORKS. In a commercial
setting you pay to train new people on programs but in an open source
setting there is no training. ..."

"... But if your program needs to live forever then you
really need literate code."
Recently, I did some major overhauls of two literate programming exercises. I revised the pyWeb tool to better handle LaTeX output, as well as add unit tests and -- consequently -- fix some long-standing problems. Also, I revised the COBOL DDE parser to better handle numeric data, replace the old FixedPoint module with Decimal, add unit tests and -- of course -- fix other bugs that showed up.

Based on my recent experience, I have some advice on "Full Life-Cycle Literate Programming".

A Life Cycle

In order to identify the barriers, we need to look at the deliverables and the software development life cycle that produces those deliverables. Let's break the software development life-cycle down as follows.
  • New Development
  • Maintenance
  • Adaptation
We'll presume that each of these efforts includes some elaboration of requirements, some design, and some transition to operational use. We only care about the coding part of the job, so we're not going to dwell on all of the other activities that are part of Application Life Cycle Management.

The question is about that transition from New Development to Maintenance or Adaptation. Doing new development seems somehow easier than maintenance or adaptation. How do we work with an established Literate Program?

New Development

New Development of a program is always a delicate subject. We have an explicit goal of creating some deliverable. We'll look at the deliverables next. First, we'll look at the conflicting forces that must be balanced.
  1. It must satisfy the need. There are requirements for the program's behavior, interfaces and implementation. Above all it must work.
  2. It must use appropriate resources. The data structures and algorithms must reflect sensible engineering choices. There's no call for "micro-optimization" of each silly piece of syntax. However, the algorithm's (and data structures) should be minimized.
  3. It must be adaptable.
  4. It must be maintainable.
  5. It must meet other organizational needs like cost, time-to-develop, language and toolset, infrastructure requirements, etc.
One can maximize one at the expense of others. For instance, one can reduce development costs to the minimum by creating a mess that's neither adaptable nor maintainable. Indeed, one can create software very cheaply if one starts relaxing functional requirements. Software that doesn't work well can be very cheap to create.

Forward vs. Reverse Literate Programming

As a digression, we'll note that some folks recognize two broad approaches to literate programming (LP). This isn't the whole story, however. Ordinary LP encourages the author to create a document that contains and explains working software. A simple tool extracts a nice final publication-ready document and working code from the author's original source document.

Reverse LP is the technique used by tools like JavaDoc, Sphinx, Epydoc, DOxygen. This usually takes the form of detailed API documentation, but it can be richer than simply the API's. In this case, comments in the source code are extracted to create the final publication-ready document. In Sphinx the author uses a mixture of source code plus external text to create final documentation. This isn't as interesting, since the resulting document can't easily contain the entire source.

We can assign the retronym "Forward Literate Programming" to ordinary LP to distinguish it from Reverse LP.

Code-First Literate Programming

There's an apparent distinction between two variations on the Forward LP theme: Document-First and Code-First LP. In Document-First, we aspire to a noble ideal of writing the document and the code from first principles, from scratch, "de novo", starting with a blank page. The code-first approach, on the other hand, refactors working code is into a literate programming document.

One can argue that code-first refactoring is A Bad Thing™ and subverts the intent of literate programming. The argument is that one should think the program through carefully, and the resulting document should be a tidy explanation of the development of the ideas leading to the working software.

However, Knuth's analysis of "The original Crowther/Woods Adventure game, Version 1.0, translated into CWEB form" (at ADVENT) shows that even ancient Fortran code can be carefully analyzed and retro-actively transformed into a piece of literature.

Working forward -- starting with a blank sheet of paper -- isn't always the best approach. The bad ideas and dead-ends don't belong in that explanation. All of the erasing and rewriting should be left out of the LP document. This means that the document should really focus on the final, working, completed code. Not the process of arriving at the code. Why start with a blank page? Why not start with the code?

In short, code-first LP isn't wrong. Indeed, it isn't even a useful distinction. If the resulting document (a) contains the entire source and (b) stands as piece of well-written description, then the literate programming mandate has been satisfied.

Center of Balance

Literate Programming strikes a balance among the various development forces. It emphasizes working software with abundant documentation. It does not emphasize the short-term cost to develop. It does, however, emphasize the long-term value that's created.

Interestingly, the idea is to minimize the labor involved in creating and maintaining this documentation. To some folks, it seems odd that all that writing would somehow be "minimal". Consider the alternative, however.

We can try to create software and documentation separately, claiming it's somehow easier. First, we write the software, since that's the only deliverable that matters. Second, we slap on some extra documentation, since only the software really matters. While satisfying in some respects, most folks find -- in the long run -- that this is unworkable. They often diverge.

When the code and the comments disagree, probably both are wrong.

The goal of LP is to prevent this.

Literate Programming seems like a lot of work. But it's work we have to do anyway. And a non-literate approach is simply more work. Almost any approach that seems to create software "quickly" doesn't create any enduring value. Why not?

The Deliverables

The point of all software development is to create a two-part deliverable.
  • The working software
  • Some supporting justification or reason for trusting the software
The justification can take several forms: test results, formal proof, API Documentation ("Reverse Literate Programming"), an explanation (separate from the code) or a Literate Programming document.

In many cases, our customers want most of the above. Folks don't expect a formal proof, but they often demand everything else.

Claiming that the software can exist without the supporting justification is to reduce software development to a hobby. The worst-run of amateur software development organizations do tolerate a piece of software without a single test or scrap of documentation. That only proves the point: if your organization tolerates junk software without supporting documentation, it's one of the worst-run of organizations; feel free to quit.

The point of LP is to create the software (and supporting documents) from a single LP source document. LP seeks to minimize the effort required to create software with supporting documentation that actually matches the software.

I'll emphasize that.

Literate Programming seeks to minimize the effort required to create software with supporting documentation

If we have to produce software, tests and explanations, clearly it is simpler to have a single source file which emits all of that stuff in a coherent, easy-to-follow format. While it's clearly simpler, there are some barriers to be overcome.

If It's So Much Easier... ?

The Jon Bentley issue with LP is that it doesn't feel easier to write a coherent document because essentially, we aren't all good writers. Bentley notes that there are good writers and good programmers and that some folks are not members of both sets. I think this misses the point. We're going to produce documentation, no matter how good a writer we are.

Most people do not see LP as simpler. They see it as a lot of work. Weirdly, it's work they already do, but they choose to keep the program and the explanation separate from each other, making it more work to keep them in synch. I can see why they claim it's more work.

If it's easier to do this in one document, why doesn't everyone simply create a literate program?

Generally, we've got three kinds of barriers that make Literate Programming hard. First, the tools at our disposal don't really support an LP kind of development effort. We get very used to intelligent syntax coloring and code folding. We find tools which lack these features to be harder to use. Second, we're working in multiple languages in a single document. Finally, it takes some experience to get settled into an LP mode.

The Tool Barrier

The first of the barriers to effective literate programming is the tool pipeline. The complaint is that "having to ‘compile’ my markdown to code, then compile my code to test it, all seems like too much hard work".

This is interesting, but specious. The multi-step process is what scons, make, ant and maven are for. A simple SConstruct file will handle web, weave, publication, compilation and unit test in a single smooth motion.

There are a lot of tools involved in literate programming. We've introduced an additional markup language into the mix, creating additional steps. This isn't any more complex than working with any compiled language. We often forget that the C compiler is really a multi-stage pipeline. Our LP tools -- similarly -- are multi-stage pipelines.

Also, for Python and F# programmers, there's something else that Seems Very Important™. It isn't. F# and Python have console interfaces (sometimes called the Read Evaluate Print Loop, REPL); this clutters up the problem with an irrelevant detail. Console hacking is helpful, but it isn't literate and it's barely programming.

The Language Barrier

In addition the tool barrier, we also have a language barrier. When we're doing literate programming we're working in at least three different languages concurrently. This makes our life seem difficult.
  • Literate Programming Markup. This might be CWEB, pyWeb or any of a number of LP markup systems.
  • Target Document Markup. This might be LaTeX, RST, Markdown, DocBook XML or some other markup.
  • Target Programming Languages. For classic, Knuth-style projects, there's only a single language. However, for many projects this will not be a single language. For example, in a web environment, we'll have program source, SQL, HTML, CSS, and possibly other languages thrown in.
It's difficult to sort this out from an IDE's perspective. How to handle syntax highlighting and code coloring? How to handle code folding and indexing the document as presented?

The old-school techniques of decomposing a big document into small sections still applies to literate programming. The document sections do not in any way correspond with the final program source, making the LP document tree far, far easier to work with.

The Mental Barrier

The final barrier is entirely mental. This is really one of experience and expectation.

It's hard -- really hard -- to step back from the code and ask "What's this mean?" and "How would I explain it?"

Too often, we see a problem, we know the code, and we understand the fix -- as code. This is a skill as well as a habit we build up. It's not the best habit because the meaning and explanatory power can be ignored or misplaced.

Stepping back from the code seems slow. "It's a one-line change with a 10-paragraph explanation!" developers gripe. "I could make the change now or spend hours explaining the change to you. The value is in making the change and putting it into production."

And that's potentially wrong.

Only a very small part of the a developer's value is the code change itself. If code will be in production for decades (my personal best is 17 years in production) then the 10-paragraph explanation will -- over the life of the software -- be worth it's weight in gold. A one line fix may actually be a liability, not an asset.

Solid Approach

I think the approach has to be the following.
  1. Create a Spike Solution. Something that works, is incomplete, but shows the core approach, algorithms and data structures.
  2. Outline the next more complete solution using LP tools. The component structure, the logical model, the basics of the first sprint.
  3. Create a publication pipeline to process the LP source into document, code and tests, and run the test suite. A kind of the Continuous Integration daily build. This is easily a double-clickable script, or "tool" in an IDE.
  4. Fill in the code, the unit tests, and the necessary packaging and release stuff. Follow TDD practices, writing unit tests and code in that order. What's cool is being able to write about them side-by-side, even though the unit tests are kept separate from the deliverable code in the build area.
  5. Review the final document for it's explanatory power.
Consider a number of things we do in comments that are better done outside the comments.
  • TODO lists. We often write special TODO comments. These can go in the proper Literate Programming text, not in the code.
  • Code samples. In JavaDocs, particularly, sample code isn't fun because of the volume of markup required. LP code samples are just more code; you can make them part of small "demo" or "test" structures that actually compile and are actually tested. Why not?
Consider a number of things we don't often do well.
  • Background on an algorithm or data structure. Footnotes, links, etc., are often slightly easier to write in word-processing markup than comments in the code.
  • Performance information on the choice of a data structure. Merely claiming that a HashMap is faster isn't quite as compelling as running timeit and including the results.
  • Binding unit tests and code side-by-side. Current practice keeps the unit tests well separated from code. (Django framework models are a pleasant exception.) What could be nicer than a method followed by unit tests that show hot it works? You may write the tests first, but the code-first explanation is sometimes nicer than the test-first development.
I think that LP isn't all that hard, but we have three barriers to overcome. We don't have exceptional tools. We have a complex welter of languages. And we have bad habits to break and transform into new habits.

Thursday, March 25, 2010

Building Skill Books -- Google Group

The readers of the Building Skills Books have a number of needs:
  1. A way to post errors and corrections. I get a lot of these. Thanks!
  2. A way to share comments and questions. I get a fair number of these.
A Google Group seems to be the best approach. We have pages, discussions, email notifications, a lot of features.

As long as we're opening the group, I figured I should actually make the dozens of corrections that have been sent in. Also, I'm looking closely at using Lulu to handle hard-copy production for the folks that have asked for that.



Tuesday, March 23, 2010

Architecture Change: Breaking Conway's Law

In Architecture Change: Recognizing Conway's Law we looked at the profound influence Conway's Law has on architecture.

Recently I've looked at two gutsy declarations that an architecture was broken. One recognized that a three-tiered architecture was too complex for their needs. The other recognized that the Ontology tools weren't performing well, and perhaps weren't helping.

My point is that these architectural mistakes are the result of Conway's Law. They aren't inherently flawed.

The Root Cause

What's flawed is not the architecture. What's flawed is the organization that built the architecture.

A three-tiered architecture is workable. In some cases, it's necessary. In other cases it could be overkill. But it isn't the cause of the problems.

An Ontology is often a good thing. However, using the ontology to represent what is -- essentially -- a Star Schema fact table is poor use of the technology.

Declaring the architecture broken is not a technical statement. It' an organizational statement. It says that the organization, the teams, the areas of responsibility are broken.

Rule 1: A Broken Architecture Is A Broken Organization

Complexity

One can try to make a distinction between Essential Complexity and Accidental Complexity. One can claim that essential complexity is part of the solution and accidental complexity is just other staff that accretes. This doesn't make any sense, since software development is not "accidental". Software doesn't "happen". It's hard to call something "accidental complexity" without saying that software involves random accidents. Blaming "accidental" complexity is a dodge, an attempt to obscure the root cause.

One might call it incidental or tangential complexity. But that still hides the fundamental problem.

To be more honest, one must separate Problem Complexity from Solution Complexity. The Problem Domain may be inherently complex. In which case, simplification is hard and 2 tiers, 3 tiers or N tiers don't matter. The problem itself is hard, no matter what architecture is chosen.

An ontology, for example, is very helpful when the problem itself is inherently hard. The formalization of relationships in an ontology can help beat a path through a tangled problem domain.

In most cases of a broken architecture, the solution is has grown out of scale with the problem's inherent complexity. If we're doing actuarial risk analysis, we don't really need an ontological model of "Risk": we need facts that help us measure the risk factors.

Rule 2: A Broken Architecture Means the Solution Doesn't Fit the Problem

Corollary: The Organization Doesn't Fit the Problem

Kinds of Broken

Why would we declare an architecture broken? Generally, we've got a grotesque failure due to the very structure of the solution. These can be decomposed into five areas.
  • Failure to satisfy the need; i.e., the software doesn't have the required functions or features.
  • Failure to use resources effectively; i.e., the software is slow, uses too much disk or too much network traffic.
  • Failure to be maintainable; i.e., bugs cannot be fixed.
  • Failure to be adaptable; i.e., new features cannot be added.
  • Failure to fit other organizational needs (cost, licensing, etc.); i.e., it's too expensive.
The two broken architectures I've heard about recently have different problems. One is unacceptably slow (as well as hard to adapt). The other is described by some as impossible to maintain and adapt.

Rule 3: All Architectural Problems Are Symptoms of Organizational Problems

In short, a broken architecture is not a simple technical problem and it doesn't have a simple technical solution. It's an organizational problem, and it has a multi-part solution.

Making Progress

It's important to acknowledge that Conway's Law, like Mutual Attraction and Thermodynamics is a feature of the universe. It cannot be "broken" or even "subverted". You cannot win, you cannot break even, you cannot quit the game.

Axiom: Conway's Law Cannot Be Broken.

Given that Conway's Law is like Thermodynamics, you have to work with it.

Conclusion: Architecture Must Drive Organization; Problem Must Drive Architecture

The only way to make progress is to restart the project at a fundamental level. You have to -- effectively -- fire everyone and rehire then to create brand-new team. The broken architecture came from a broken organization. To fix the architecture, you need to fix the organization.

Example #1, Unmaintainable Stored Procedures

Consider an application with stored procedures (SP) so badly broken as to be unmaintainable. Let's say it's many hundreds of lines of code. A Cyclomatic Complexity so high as to be laughable. Clearly, the folks responsible for building this need to be reassigned and new folks need to be brought in. If the new folks are simply assigned to the same old separate SP/DBA group, then a new unmaintainable mess will eventually replace the existing unmaintainable mess.

Conway's Law applies: If the SP developers are separate, they will evolve in their own direction. If you want to have a "technical" reason for SP's, then you have to prove that they're more effective than a non-SP implementation. That means spike solutions to compare SP's and your other application programming languages point-by-point.

To prevent stored procedures from getting out of control there are two choices.
  1. Don't use stored procedures. Put that logic in with the rest of the application, where it belongs. Same code base, not a separate language buried in the database. One team, one language.
  2. Don't make stored procedure writing a separate "team". The stored procedure writing must be part of application writing. One team, multiple languages.
Note that choice #2 leaves it to the team to use stored procedures if they have a provable improvement on performance. Things are not handed over to the DBA's because SP's must do all database interface or SP's must maintain "low-level" rules or other blurry lines. Things are not handed to the DBA's -- the team solves the problem.

Example #2, Too Many Tiers

Consider an architecture with too many tiers. The inter-tier communication is blamed as creating "accidental complexity". This is a dodge. The coordination between teams is what creates complexity.

To prevent inter-tier communication from being a problem, one doesn't need to remove tiers. One needs to remove organizational structure. There's really only one choice.

Fail: Team Follows Technology
Win: Team Follows Features

For a given feature set, everyone involved has to become part of one, unified team working one one sprint attending one daily stand-up meeting.

"But that's unwieldy," you say. "DBA's have to be kept separate."

That's Conways' Law in action.

To work with Conway's Law, you must create a team that owns the feature set -- all tiers -- all technologies -- and can make all the implementation choices required to bring that feature set to the users.

Example #3, Overuse of Ontology

Consider an inappropriate use of an Ontology where a Database would have been a better choice.
  1. Remove the old team. Assign them to hard problems where the ontology pays dividends, get them away from easy problems where the ontology is a solution looking for a problem.
  2. Create a new team around the new solution. Each feature has a team that has a complete skill set -- front-end, bulk processing, persistence, web server, database, network -- everything.
  3. The new team stands alone and builds the solution.
Excuses Excuses

The number one cultural impediment is the "Skill Focus" excuse. These are just Conway's Law in action.
  • "We can't have application programmers doing database design. They might 'mess things up'."
  • "We don't want our DBA's assigned to application development teams. They have operational responsibilities that trump new development."
The number two cultural impediment is the authorization excuse. These are also Conway's Law, wrapped in the mantel of "security".
  • "We can't allow application developers sudo privileges to configure Apache (or MySQL, or Oracle, or -- frankly -- anything.)"
  • "We can't assign a DBA or SysAdmin or anyone to support new development..."
Conclusion

Stop organizing teams by skills.

Start organizing teams by deliverable.

Stop carving out random technology features without proof that the technology solves a problem. Stored Procedures, Middle Tiers, Ontologies are just potential solutions. Don't commit to them until they're proven.

Start creating spike solutions to measure the value of a technology. If a spike solution doesn't work, stop development, change the plans, change the schedule and start again based on the lessons learned.

Stop forcing a deadline-driven death march.

Start learning technology lessons and making project changes based on what was learned.

Monday, March 22, 2010

Architecture Change: Recognizing Conway's Law

I've got lots of examples of places where Conway's Law has turned a good idea into a poor implementation. A classic is a data warehouse where there were three project managers, so they broke things up three ways, leading to a crazy mess of dumb duplication.

Countering that, I've recently look at two gutsy declarations. It takes real courage to declare an architecture wrong. Our basic human nature prevents us from acknowledging that an existing architecture is a liability, not an asset.

Pitching a fix is easy. Locating the root cause of the original problem is hard. Trying to fix a broken architecture means that you will run afoul of Conway's Law. In addition to having the guts to acknowledge that something is broken, figuring a way to work with Conway's Law is essential to success.

Broken 3-Tier Architecture

The biggest reason for broken architectures is dumb over-engineering. And most of the dumbosity has Conway's Law as its root cause. Yes, organizational structures will impose a solution structure that doesn't match the problem. There are lots of examples.

If you read too much and build too little, you find a ton of articles on .Net 3-Tier Architectures. Google and you'll get a mountain of hits, each with a distinctive spin on 3-Tier. For reference, start with this: Building an N-Tier Application in .NET. It's the party line on splitting things into tiny buckets consistent with the MS product offerings.

A "3-Tier" presentation is very seductive because it plays by Conway's Law.
  • Manager A. Lobbies for Web-based solution; takes over "presentation" development and builds a team to create flashy front-end stuff with cool tools and technologies: HTML, CSS, JavaScript, Silverlight, etc. The front-end developers are as much graphic designer as programmer; they have distinct skills. Conway's Law says that since they're separate from "other" programmers, presentation must be a separate tier.
  • Manager B. Manages the DBA's. DBA's must be kept separate because a database is "infrastructure", like a network and a web server. Somehow database development is usually lumped in with database administration and development competes with operation for resources. Conway's Law says that DBA's are separate so there must be a separate data tier.
  • Manager C. All of the interfaces and batch loads have to be done by someone. There's no sizzle to this; it isn't fun for DBA's. It's frankly boring stuff. Another manager is assigned to create "back-end" interfaces, and other stuff. Conway's Law says that we'll introduce a "middle tier" to give these people something to contribute to the web application.
At this point, some people call "shenanigans". They say that this Conway's Law analysis is crazy talk: I'm just fitting the evidence to my theory. Here's my question? What's the alternative to the 3-tier architecture? Are they claiming that the three tiers are logically necessary?

Necessary Decomposition

If three tiers were logically necessary, we wouldn't discuss N-tier architectures.

Clearly, folks have decomposed things into more than three tiers. So, three tiers isn't necessary. It's just convenient. QED: There's no necessity to three tiers; it's just a handy team size.

Some scalability works out well with a three-tier separation. In particular, serving a lot of static content (CSS files, PNG's, static HTML) can be delegated to a front-end tier. Serving the dynamic content is better handled by a separate process (perhaps even a separate processor). Database processing -- because it's I/O bound, is often well-handled by a separate process.

However, if the "middle-tier" has a lot of work or relies on slow external web services, it might decompose into sub-tiers. No more three-tier solution. Similarly, one can make a case for splitting static content services into two sub-tiers: reverse proxy and proper content server. Again, no more three-tier solution.

Three tiers, five tiers or N tiers: the architecture could have been driven by necessity or it could be driven by Conway's Law. Clearly, Conway's Law has a profound influence. Indeed, most of the time, Conway's Law trumps all other considerations.

Otherwise we wouldn't have broken architectures. If the decisions were technical, we'd have technical spikes and we'd discard broken ideas. Instead we pursue broken ideas in that weird deadline driven project death-march.

Apostasy

One consequence of Conway's Law is Stored Procedures. That's the tier assigned to the DBA's. The idea that stored procedures might be a bad idea strikes at the very heart of all DBA's (and their managers) and is therefore unthinkable. Try suggesting that stored procedures be replaced by middle-tier application logic. Everyone says that replacing SP's with application code is heresy.

Less than two years ago I sat in a meeting where I was told, very plainly, that the only provably scalable solution was a CICS transaction server and a mainframe DB2 database. The entire room was told that web architectures were a bad idea. Only CICS could be made to work. This is just as dumb as claiming that stored procedures are essential.

This kind of thing leads to a Conway's Law Hybrid solution (CLH™) where the web front-end used SOAP web services to talk to a CICS back-end that merely invoked DB2 stored procedures. No other architecture was discussable. The architecture documentation had to be rewritten to put the simple web site into an appendix as an "alternative". The primary pitch was a hell-on-earth hybrid.

Since there was no DBA bandwidth to write all these stored procedures, the project could only be cancelled. Business rules in Java were unthinkable, heretical. As a former DBA, my suggestion to give up on stored procedures makes me apostate. Stored procedures can be driven by necessity or Conway's Law.

Conway's Law

This concept is known as Conway’s Law, named after Mel Conway, who published a paper called “How Do Committees Invent?”. Fred Brooks cited Conway’s paper in his classic “The Mythical Man Month”, and invented the name “Conway’s Law”. Here’s the definition from Conway’s own website (which also has the original paper in full):

Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.

More Broken Architectures

Another example.
  • Manager A. FLEX front-end development.
  • Manager B. Ontology development.
Wait, what? Ontology? No database?

Not at first. Clearly, a good ontology engine will handle the information processing needs. It will be great. The FLEX front-end can make SPARQL queries, right?

Actually, it doesn't work out well. SPARQL is slow. Hardly appropriate for a rich user interface.

So here's another pass at this.
  • Manager A. FLEX front-end development.
  • Manager B. Ontology development.
  • No One In Particular. Backend Web Services Development between FLEX and the Ontology.
"Aha!" you say. "An example that proves Conway's Law is wrong."

Actually, this is evidence that Conway's Law can't be patched. The initial ontology-based application is entirely Conway's Law in action. Trying to create the necessary architectural features without creating a proper organization around the solution ran aground.

Calling It Quits

A really hard thing to do is call it quits when something isn't working. A fundamental law of human behavior says that we hold onto losers. Partly, this is the Endowment Effect -- once the architecture is in place, it can be salvaged. Partly, this is Loss Aversion -- declaring the old architecture broken realizes that the investment created a liability, not an asset.

How do you restart the project with a new architecture?

How do you avoid Conway's Law in the next generation of a web application?

Stay Tuned for part 2 -- Architecture Change: Breaking Conway's Law.

Saturday, March 20, 2010

Obsolescence

My old Citizen Pro-Master watch died. It needs batteries. It's a dive watch, so it also needs to be opened by professionals, have the gaskets replaced, and get pressure tested to be sure it works.

I tried sending it to the Citizen Watch Service facility in Dallas. Their web site has the advantage of being search-engine friendly, a real plus. It has some significant problems, also.
  1. Their address is an image, not text. How do I copy and paste to create a shipping label?
  2. They spelling mistakes.
  3. Overall, it has an amateurish look, leaving one uncomfortable mailing and expensive watch to them.
They do respond promptly, however, and told me not to mail them the watch. They did not have parts. They suggested the "Torrance" facility.

Okay, try and find the "Torrance" facility on line.



Not so easy, is it? You can find a lot of peripheral information about the Torrance location and it's location. But not any real contact information directly from a Citizen-branded web site.

The Citizen site is slick, but appears to be totally search-engine unfriendly. No spelling mistakes, but amazingly hard to find the "Torrance facility" via the Citizen site.

Further, without actually seeing the watch, email #3 said "We hate for you to send the watch only to find out that we too cannot fix it."
  1. It doesn't work. Why would you hate to have me send it out? If you can't fix it, I haven't lost anything by trying, have I? I don't understand that comment.
  2. Dallas never looked at it, so the "we, too" part doesn't make sense, either.
I tried one last time to explain that I just wanted batteries.

After four emails simply trying to figure out if they would look at it, I guess I have to give up trying to get it fixed.

Sad that a solidly built watch isn't even good for 20 years of service. Sad, too, that the web sites are collectively so bad: either they're slick and search-engine proof or amateurish.

Friday, March 19, 2010

Security Vulnerabilities

I lean on the OWASP list heavily. http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project


The point is that most of the vulnerabilities are pretty clear.
  1. Injection flaws: SQL, OS, and LDAP injection. Pretty clear that building SQL, shell scripts or LDAP queries dynamically is simply wrong. Don't do it. Use SQL Binding, and proper escaping/quoting/filtering.
  2. Cross-site scripting. Again, proper escaping/quoting/filtering is essential.
  3. Authentication and session management. This is generally done well by most frameworks.
  4. Insecure object references. Files, directories, etc. A good framework prevents this by making all URL's into indirect references to underlying objects.
  5. Cross-site request forgeries, like session management, are generally handled by frameworks.
  6. Security misconfiguration. This is where actual skills shown up. This can be hard, and takes work.
  7. URL-level validation. I thought this went without saying: all URL's are available to users even if the link is not on a page anywhere; anyone can bookmark or forge a request. All requests must be validated even if "there's no way the user could see that link and click on it."
  8. Unvalidated redirects and forwards. This strikes me as weird because we use redirects in one (and only one) situation: redirect-after-post. However, if you synthesize a redirect from user input -- without filtering, validating or quoting properly -- you'd be open to problems.
  9. Insecure crypto. Like security misconfiguration, this is very hard work on the part of architects and administrators. Key escrow systems are part of this, as is encrypted database fields and (possibly) encrypted physical storage. Sigh.
  10. Transport layer protection. SSL is part of any security framework.
Some of these are solved by using commonly-available open-source frameworks.

Too many people reject these open-source solutions for dumb or wrong reasons.
One of the biggest mistakes is to say that a framework is "too heavyweight" for a small web application.

The rules are simple: either reinvent the wheel properly, or use an established open-source framework.

Open Source? Yes, one that can be vetted for security vulnerabilities.

Wednesday, March 17, 2010

COBOL File Processing in Python (really)

Years ago (6? 7?) I did some data profiling in Python.

This required reading COBOL files with Python code.

Superficially, this is not really very hard.
  1. Python slice syntax will pick fields on of the record. For example: data[12:14].
  2. Python codecs will convert from EBCDIC to Unicode without pain. codecs.get('cp037').decode( someField ).
With some more finesse, one can handle COMP-3 fields. Right?

Maybe not.

Problems

There are three serious problems.
  • Computing the field offsets (and in some cases sizes) is a large, error-prone pain.
  • The string slice notation makes the COBOL record structure completely opaque.
  • COMP-3 conversion is both ubiquitous and tricky.
Okay, what's the solution?

COBOL DDE Parsing

What I did was write a simple parser that read the COBOL "copybook" -- the COBOL source that defined the file layout. Given this Data Definition Entry (DDE) it's easy to work out offset, size and type conversion requirements.

It was way cool, so I delivered the results -- but not the code -- to the customer. I posted parts of the code on my personal site.

Over the years, a few people have found it and asked pointed questions.

Recently, however, I got a patch kit because of a serious bug.

Unit Tests

The code was written in Python 2.2 style -- very primitive. I cleaned it up, added unit tests, and -- most importantly -- corrected a few serious bugs.

And, I posted the whole thing to SourceForge, so others can -- in principle -- fix the remaining bugs. The project is here: https://sourceforge.net/projects/cobol-dde/.

Monday, March 15, 2010

How do I use all my cores?

News Flash: Multi-core programming is "hard". EVERYBODY PANIC.

ZOMFG: We either need new tools, new languages or both! Right Now!

Here's one example. You can find others. "Taming the Multicore Beast":
The next piece is application software, and most of the code that has been written in the past has been written using a serial approach. There is no easy way to compile that onto multiple cores, although there are tools to help.
What?

That's hooey. Application software is already working in a multicore environment; it has been waiting for multi-core hardware. And it requires little or no modification.

Any Linux-based OS (and even Windows) will take a simple shell pipeline and assure that the processing elements are spread around among the various cores.

Pipelines and Concurrency

A shell pipeline -- viewed as Programming In The Large -- is not "written using a serial approach". Each stage of a shell pipeline runs concurrently, and folks have been leveraging that since Unix's inception in the late 60's.

When I do python p1.py | python p2.py, both processes run concurrently. Most OS's will farm them out so that each process is on its own core. That wasn't hard, was it?

I got this email recently:
Then today, I saw the book
By Cory Isaacson

At that point, I figured that there are a lot of yahoos out there that are barking up the wrong tree.
I agree in general. I don't agree with all of Isaacson's approach. A big ESB-based SOA architecture may be too much machinery for something that may turn out to be relatively simple.

Easy Problems

Many problems are easily transformed into map-reduce problems. A "head" will push data down a shell pipeline. Each step on the pipeline is a "map" step that does one incremental transformation on the data. A "reduce" step can combine data for further maps.

This can be expressed simply as: head.py | map1.py | map2.py | reduce1.py | map3.py. You'll use both cores heavily.

Optimization

Some folks like to really focus on "balancing" the workload so that each core has precisely the same amount of work.

You can do that, but it's not really going to help much. The OS mostly does this by ordinary demand-based scheduling. Further fine-tuning is a nice idea, but hardly worth the effort until all other optimization cards have been played. Even then, you'd simply be moving the functionality around to refactor map1.py | map2.py to be a single process, map12.py.

Easy and well-understood.

Harder Problems

The Hard Problems involve "fan-out" and "fan-in". Sometimes we think we need a thread pool and a queue of processing agents. Sometimes this isn't actually necessary because a simple map-reduce pipeline may be all we need.

But just sometimes, there's a fan-out where we need multiple concurrent map processors to handle some long-running, complex transformation. In this case, we might want an ESB and other machinery to handle the fan-out/fan-in problem. Or, we might just need a JMS message queue that has a one writer and multiple readers (1WmR).

A pipeline has one writer and one reader (1W1R). The reason why fan-out is hard is that Linux doesn't offer a trivial (1WmR) abstraction.

Even fan-in is easier: we have a many writer one reader (mW1R) abstraction available in the select function.

The simplest way to do fan-out is to have a parent which forks a number of identical children. The parent then simply round-robins the requests among the children. It's not optimal, but it's simple.

Bottom Line

Want to make effective use of your fancy, new multi-core processors?

Use Linux pipelines. Right now. Don't wait for new tools or new languages.

Don't try to decide which threading library is optimal.

Simply refactor your programs using a simple Map-Reduce design pattern.

Thursday, March 11, 2010

Great Lies: "Design" vs. "Construction"

In reflecting on Architecture, I realized that there are some profound differences between "real" architecture and software architecture.

One of the biggest differences is design.

In the earliest days, software was built by very small groups of very bright people. Alan Turing, Brian Kernighan, Dennis Ritchie, Steve Bourne, Ken Thompson, Guido van Rossum. (Okay, that last one says that even today, software is sometimes built by small groups of very bright people.) Overall architecture, both design and construction where done by the same folks.

At some point (before I started in this business in the '70's) software development was being pushed "out" to ever larger groups of developers. The first attempts at this -- it appears -- didn't work out well. Not everyone who can write in a programming language can also design software that actually works reliably and predictably.

By the time I got my first job, the great lie had surfaced.

There are Designers who are distinct from Programmers.

The idea was to insert a few smart people into the vast sea of mediocre people. This is manifestly false. But, it's a handy lie to allow managers to attempt to build large, complex pieces of software using a a larger but lower-skilled workforce.

Reasoning By Analogy

The reasoning probably goes like this. In the building trades there are architects, engineers and construction crews. In manufacturing, there are engineers and factory labor.

In these other areas, there's a clear distinction between design and construction.

Software must be the same. Right?

Wrong.

The analogy is fatally flawed because there is no "construction" in the creation of software. Software only has design. Writing code is -- essentially -- design work.

Architecture and Software Architecture

Spend time with architects and you realize that a good architect can (and often does) create a design that includes construction details: what fastenings to use, how to assemble things. The architect will build models with CAD tools, but also using foam board to help visualize the construction process as well as the final product.

In the software realm, you appear to have different degrees of detail: High Level Design, Detailed Design, Coding Specifications, Code.

High Level Design (or "Architecture") is the big picture of components and services; the mixture of purchased plus built; configuration vs. constructions; adaptation vs. new development. That kind of thing. Essential for working out a budget and plan for buying stuff and building other stuff.

Usually, this is too high-level for a lot of people to code from. It's planning stuff. Analogous to a foam-board overview of a building.

Detailed Design -- I guess -- is some intermediate level of design where you provide some guidance to someone so they can write programming specifications. Some folks want this done in more formal UML or something to reveal parts of the software design. This is a murky work product because we don't have really formal standards for this. We can claim that UML is the equivalent of blueprints. But we don't know what level of detail we should reveal here.

When I have produced UML-centric designs, they're both "too technical" and "not detailed enough for coders". A critique I've never understood.

Program Specifications -- again, I'm guessing -- are for "coders" to write code from. To write such a thing, I have to visualize some code and describe that code in English.

Let's consider that slowly. To write programming specifications, I have to
  1. Visualize the code they're supposed to write.
  2. Describe that code in English.
Wouldn't it be simpler to just let me code it? It would certainly take less time.

Detailed Design Flaws

First, let me simplify things by mashing "Detailed Design" and "Specification" together, since they seem to be the same thing. A designer (me) has to reason out the classes required. Then the designer has to pick appropriate algorithms and data structures (HashMap vs. TreeMap). Then the designer has to either draw a UML picture or write an English narrative (or both) from which someone else can code the required class, data structure and algorithm. Since you can call this either name, the names don't seem to mean much.

I suppose there could be a pipeline from one design document at a high level to other designs at a low level. But if the low-level design is made difficult by errors in the high-level design, the high-level designer has to rework things. Why separate the work? I don't know.

When handing things to the coders, I've had several problems.
  1. They ignore the design and write stuff using primitive arrays because they didn't understand "Map", much less "HashMap" vs. "TreeMap". In which case, why write detailed design if they only ignore it? Remember, I provided specifications that were essentially, line-of-code narrative. I named the classes and the API's.
  2. They complain about the design because they don't understand it, requiring rework to add explanatory details. I've gone beyond line-of-code narrative into remedial CS-101. I don't mind teaching (I prefer it) but not when there's a silly delivery deadline that can't be met because folks need to improve their skills.
  3. They find flaws in the design because I didn't actually write some experimental code to confirm each individual English sentence. Had I written the code first, then described it in English, the description would be completely correct. Since I didn't write the code first, the English description of what the code should be contained some errors (perhaps I failed to fully understand some nuance of an API). These are nuances I would have found had I actually written the code. So, error-free specifications require me to write the code first.
My Point is This.

If the design is detailed enough to code from -- and error free -- a designer must actually write the code first.

Indeed, the designer probably should simply have written the code.

Architecture Isn't Like That

Let's say we have a software design that's detailed enough to code from, and is completely free from egregious mistakes in understanding some API. Clearly, the designer verified each statement against the API. I'd argue that the best way to do this is to have the compiler check each assumptions. Clearly, the best way to do this is to simply write the code.

"Wait," you say, "that's going too far."

Okay, you're right. Some parts of the processing do not require that level of care. However, some parts do. For instance, time-critical (or storage-critical) sections of algorithms with many edge cases require that the designer build and benchmark the alternatives to be sure they've picked the right algorithm and data structure.

Wait.

In order for the designer has absolute certainty that the design will work, they have to build a copy that works before giving it to the coders.

In architecture or manufacturing, the construction part is expensive.

In software, the construction part does not exist. Once you have a detailed design that's error-free and meets the performance requirements, you're actually done. You've created "prototypes" that include all the required features. You've run them under production-like loads. You've subjected them to unit tests to be sure they work correctly (why benchmark something that's incorrect?)

There's nothing left to do except transition to production (or package for distribution.)

Software Design

There's no "detailed design" or "programming specifications" in software. That pipeline is crazy.

It's more helpful to think of it this way: there's "easy stuff" and "hard stuff".
  • Easy Stuff has well-understood design patterns, nothing tricky, heavy use of established API's. The things where the "architectural" design can be given to a programmer to complete the design by writing and testing some code. Database CRUD processing, reporting and analysis modules, bulk file processing, standard web form processing for data administration, etc.
  • Hard Stuff has stringent performance requirements, novel or difficult design patterns, new API's. The things where you have to do extensive design and prototyping work to resolve complex or interlocking issues. By the time there's a proven design, there's also code, and there's no reason for the designer to then write "specifications" for someone to reproduce the code.
In both cases, there are no "coders". Everyone's a designer. Some folks have one design strength ("easy stuff", well-known design patterns and API's) and other folks have a different design strength.

There is no "construction". All of software development is design. Some design is assembling well-known components into easily-visualized solutions. Other design is closer to the edge of the envelope, inventing something new.

Tuesday, March 9, 2010

I see why you were confused

Got a nice email about architecture -- but the wrong kind.

It was about physical structures, not software.

It is a "bucket-list" of buildings one simply must see. 100 Amazing Buildings Every Architecture Buff Should See. A cool list to have handy.

I know bupkes about buildings. I've lived in them for all my life, I've even owned a few. But that's about it. I'm more interested in marine architecture and knowing how my boat is put together. [My boat was designed by Ted Brewer; that's the kind of architecture I'm interested in.]

Despite my utter ignorance of buildings, I am very aware that the Software Design Patterns folks were heavily influenced by Christopher Alexander's work on patterns in architecture. For the parallels, read this by Doug Lea, Christopher Alexander:An Introduction for Object-Oriented Designers.

You may also want to read SOME NOTES ON CHRISTOPHER ALEXANDER by Nikos A. Salingaros.

Retronyms

When the "electric guitar" was perfected, folks had to create a new word to replace "guitar". The word had become ambiguous, and the phrase "acoustic guitar" was invented to disambiguate "guitar".

We have the same problem with architecture. There's software architecture, marine architecture, and "unqualified" architecture. Worse, we're unlikely to get a good retronym because architecture is a pretty well-defined profession (like "doctor", "dentist" or "barber") and you can't easily rename it.

Friday, March 5, 2010

Fun

XKCD - http://xkcd.com/710/


I remember learning about this as an undergrad at Syracuse University in the 70's and didn't think much of it. It was just "one of those things" that I heard about, and perhaps wrote a homework assignment in APL or Algol-W.

The good old days. When a program to examine the conjecture was hours of heavy thinking followed by carefully monitored run-times on an IBM 370. Computer time was money back in the day. Every minute counted. And you could spend your precious budget checking the Syracuse function or going out with friends.

My solution to the Project Euler problem is 18 lines of Python.

With a few memoization tricks, it runs in something like 3 seconds on my little MacBook. Back in the day, I don't think it even compiled that quickly.

Thursday, March 4, 2010

Literate Programming

About a decade ago, I discovered the concept of Literate Programming. It's seductive. The idea is to write elegant documentation that embeds the actual working code.

For tricky, complex, high-visibility components, a literate programming approach can give people confidence that the software actually works as advertised.

I actually wrote my own Literate Programming tool. Amazingly, someone actually cared deeply enough to send me a patch to fix some long-standing errors in the LaTeX output. What do I do with a patch kit?

Forward and Reverse LP

There are two schools of literate programming: Forward and Reverse. Forward literate programming starts with a source text and generates the documentation plus the source code files required by the compilers or interpreters.

Reverse literate programming generates documentation from the source files. Tools like Sphinx do this very nicely. With a little bit of work, one can create a documentation tree with uses Sphinx's autodoc extension to create great documentation from the source.

Reverse LP, however, tends to focus on the API's of the code as written. Sometimes it's hard to figure out why it's written that way without further, deeper explanation. And keeping a separate documentation tree in Sphinx means that the code and the documentation can disagree.

My pyWeb Tool

The gold standard in Literate Programming is Knuth's Web. This is available as CWEB which generates TeX output. It's quite sophisticated, allowing very rich markup and formatting of the code.

There are numerous imitators, each less and less sophisticated. When you get to nuweb and noweb, you're getting down to the bare bones of what the core use cases are.

For reasons I can't recall, I wrote one, too. I wrote (and used) pyWeb for a few small projects. I posted some code as an experiment on the Zope site, since I was a Zope user for a while. I went to move it and got emails from a couple of folks who are serious Literate Programmers and where concerned when their links broke. Cool.

I moved the code to my own personal site, where it sat between 2002 and today. It was hard-to-find; but there are some hard-core Literate Programmers who are willing to chase down tools and play with them to see how they work at producing elegant, readable code. Way cool.

Patch Kit

Recently, I received a patch kit for pyWeb. This says several things.
  1. It's at least good enough that folks can use it and find the errors in the LaTeX markup it produced
  2. Some folks care enough about good software to help correct the errors.
  3. Hosting it on my personal web site is a bad idea.
So, I created a SourceForge project, pyWeb Literate Programming Tool, to make it easier for folks to find and correct any problems.

I expect the number of downloads to hover right around zero forever. But at least it's now fixable by someone other than me.

Monday, March 1, 2010

The Web is my ESB, but it's slow...

Transaction design seems to be really hard for some people. The transactions they build seem to based on some crazy assumptions. The problem is that benchmarking is hard because you have to build enough stuff to get a meaningful benchmark. Everyone thinks your done when really, all you did was show that you've got a rotten design.

One reason is that people people roll their own ESB. There are many nice ones, but they seem big, complex and expensive. Wikipedia has a handy list of ESB's and vendors. Instead of using a purpose-built ESB, it seems sensible to use the the web as an ESB. There's nothing wrong with using the web as an ESB. What's wrong is assuming that the web has our imaginary level of performance.

It appears that there are two assumptions people make. Here's what happens.

Shoddy Design

They design a really complex web transaction and then complain. Attributes of these complex web transactions:
  • They're part of the presentation, for example, a response to an HTML-based GET request. With a person watching it execute.
  • They involve aggregating information from other web services.
  • Sometimes, they involve multi-step workflows.
The complaints include the following:
  1. It's slow. The user is forced to wait for a long time.
  2. It's unreliable. Sometimes an information source doesn't respond at all.
So, the assumptions appear to be that the actual web is as fast as your integration test mock web. And the actual web is as reliable as your mock web.

Alternatives

In the case that the transaction is an "order from stock" (it involves competition for physical goods) then the user must wait. When ordering books from inventory, or airplane seats, or hotel rooms, the web site must display a clever animation while it grinds away doing the transaction.

But, when the transaction is placing an order, or it involves aggregating information, then there are better things than making the user sit there and watch the beach-ball spin while your transaction grinds away.
  • Make them wait while you grind. This is the "do nothing" solution; if it's slow or crashes, the user will complain.
  • Queue Up a Work Request. Tell the user you're queueing it up. Allow them to monitor the status of their queued work request.
  • Pre-Cache. We can often gather the expected information in advance and store it locally. When we're providing some kind standard information aggregates, we should gather it in advance.
Work Queues

The work queue is no different from an eBay auction. You place an order or request and monitor the status. Information aggregation shouldn't take a week; it should be quick.

The user fills in their form, or uploads their request. Your web transaction puts it into the queue, and gives immediate feedback that it was accepted.

Your web site must include, therefore, a background processor that actually handles the request. You can spawn a "nohup" subprocess. You can have a "crontab" schedule that checks the queue every minute. You can have a proper daemon spawned by "init".

The background process dequeues the request, gathers the data. It handles slow, timeout, crashes, etc. When it's done, the status is updated. Maybe an email is sent.

Pre-Cached Data

Many applications aggregate data. Except in the rare case that the data involves competition over physical goods (inventory levels, current availability, etc.) the data doesn't change constantly.

Indeed, many times the data is changed on a pretty slow schedule. Weather forecasts, econometric data, etc., changes slowly. It's easy to query this data and cache it locally. This gives the illusion of immediate response.

In some cases, the data may involve something like a Twitter feed, where there is a constant flow of data, but there's no competition over physical goods. Folks like to wring their hands over getting the absolute up-to-the-second Twitter information. This is, of course, impossible because the Internet is (1) slow and (2) unreliable. What does up-to-the-second mean when your request is trashed by a momentary problem with your web host's DNS server?

Even Twitter postings can be pre-cached. Polling the Twitter server -- and caching the interesting tweets -- every few minutes will yield results that are every bit as current as trying to get a "live" feed. Remember, the folks tweeting have latency and unreliability at their end. The Twitter servers have latency and unreliability. Your web server has latency and unreliability. Your user's browser has latency and unreliability.

High-Value Data

In some applications, the data is very high value. Electronic Health Records, for example. Econometric Data from commercial sources (see the NABE Tools page) for example. In the case of high-value data we have to account for (1) slow and we have to resolve (2) unreliable.

We can't fix slow. We have to handled it by a combination of pre-caching and managing request work queues. Use Case 1: users make a standard econometrics request; we have the current data that we've subscribed to. Done. Use Case 2: users make a non-standard request; we queue up the task, we gather the information from sources, when we've finished the job, we close the task and notify the user.

The unreliable is handled by service level agreements and relatively simple work-flow techniques. When integrating data from several sources, we don't simply write a dumb sequence of REST (or SOAP) requests. We have to break the processing down so that each source is handled separately and can be retried until it works.

Background Processing Tier

This says that a standard web architecture should have the following tiers.
  1. Browser.
  2. Presentation Tier. JSP pages, Django View Functions and Templates.
  3. Services Tier. An actual ESB. Or we can write our own Backend Processor. Either way, we must have a separate server with it's own work queue to handle long-running transactions.
  4. Persistence Tier. Database (or files). Your presentation and ESB (or Backend) can share a common database. This can be decomposed into further tiers like ORM, access and actual database.
You can try some other architectures, but they are often painful and complex. The most common attempt appears to be multi-threading. Folks try to write a web presentation transaction that's multi-threaded and handles the long-running background processing as a separate thread. Sadly threads compete for I/O resources, so this is often ineffective.

WSGI-Based ESB

Writing a REST+WSGI ESB (in Python) is relatively straight-forward.

Use wsgiref, or werkzeug. Create the "services" as WSGI applications that plug into the simple WSGI framework. Add authentication, URL processing, logging, and other aspects via the WSGI processing pipeline. Do the work, and formulate a JSON (or XML) response.

Need your services tier to scale? Use lighttpd or nginx to "wrap" your WSGI services tier. You can configure WSGI into nginx (link). Also, you can configure WSGI into lighttd (link); you can mess around with FastCGI configuration to create multiple instances of the server daemon.

It's much, much easier to make the OS handle the background processing as a separate heavy-weight process. Apache, lighttpd or nginx can make the background processor multi-threaded for you.