S.Lott-Software Architect: August 2009

Sunday, August 30, 2009

SQL Injection Attacks the Top Vulnerability

This is an amazing quote: "We see SQL injection as the top attack technique on the Web".

See ComputerWorld's SQL Injection Attacks Lead to Heartland, Hannaford Breaches for more on this topic.

I'm amazed because SQL injection is entirely a preventable bug. Yet, it's the top attack technique.

That's an amazing indictment of the programming profession. There are so many shoddy, incompetent programmers (and shoddy, incompetent customers of programming services) that SQL injection is the top attack technique.

I almost forgot the obligatory XKCD comic: http://xkcd.com/327/

Friday, August 28, 2009

Building Skills in Programming

I've updated Building Skills in Programming -- an introduction to programming for non-programmers.

The entire thing was redone in RST and Sphinx, leading to an easier-to-read, more colorful layout. The cross-references are generally better and more complete. I also get Sphinx's indexing and search capabilities.

Everything was touched, chapters were added and rearranged. Numerous Python 3 reminders were added.

I can now -- easily -- include Google Adsense advertising on each chapter.

Next steps will be to upgrade my MacOS Python to 2.6.2 and then revise the book to cover 2.6 so that it is completely up-to-date. Also, the math needs to be redone using one of the Sphinx Math extensions so that the resulting LaTeX (and PDF) work out correctly.

Currently, my casual use of dozens of Unicode math characters has lead Sphinx to create LaTeX that isn't source encoded properly.

Monday, August 24, 2009

Meetings

I found this note while cleaning up. I think it's a summary of John Cleese's short movie "Meetings Bloody Meetings".

Plan. Why are we meeting? What's the purpose? What are the subjects?
Inform. Tell the attendees what, why and the expected outcome.
Prepare. Create an agenda, with a time line, and the items to be covered.
Structure. If the purpose is to make a decision, this involves presentation of evidence, interpretation of the evidence and a decision. If the purpose is to inform, then present the information.
Summarize and Record.

I hear complaints about time wasted in meetings. This complaint is absolutely justified. I've wasted a lot of my professional career in dumb meetings.

Out of Touch Management. The out-of-touch manager requires long meetings in which each direct report provides a complete status report. The all-hands meeting appears to be the only time the out-of-touch manager ever talks to anyone. Consequently, it devolves to a sequence of one-on-one meetings which the rest of the team is forced to attend.

If the goal is for many people to inform one person, those conversations don't require a meeting.

Going Through The Motions. The going-through-the-motions manager is aware that a staff meeting is expected, but has nothing really to say or do. These meetings often devolve into awkward silences as open-ended questions are thrown out to try and stimulate some interaction.

If there's no goal (no decision, no information) then there's no purpose for the meeting.

Best Practice.

In the days before agility, I worked with two project managers who gathered information in one-on-one meetings. These are separate projects for separate customers. But the same best practice.

The PM would stop by your cube, figure out what you were doing, what you needed, what you were planning to accomplish. Then, at the ever-so-brief staff meeting, a few key interactions among the staff would be lifted up from all those one-on-one meetings.

Let's review the brilliance of these two PM's.

They stopped by your cube. They did not play the power game and make you come to their office. They stopped by. In later years this was called "Management by Walking Around" -- something hailed as brilliant. These PM's just did it.
They hung around long enough to actually get what was going on. They didn't waste time asking you to write and email status to them. They asked, listened, understood and summarized your status for you.
They recognized the needs for interaction and made them happen. The status meetings were like daily scrum meetings. Since they were weekly, they weren't as brief, but they were just as focused. No long status reports. No long conversations.

Objections.

Some managers don't have time to sit in everyone's cubicle. This overstates their value as managers by understating the huge cost of wasting everyone's time in serial one-on-one's done in a conference room. 12 direct reports means 13 man-hours wasted in a one hour meeting.

Some managers don't have time to write status reports. Instead they forward emails all day.

It's probably better to get off the email treadmill. Get status, write short, to-the-point status reports.

Focus.

Developers (DBA's, Sys Admins, all technical folk) live in a world of technology delivery.

Mangers, however, live in a world of budgets, status reports, and weird exercises in foretelling the future. ("How long will it take? How much will it cost?")

There's no good reason to impose the world of status reporting and fortune-telling on technical people. There are, however, lots of bad reasons for imposing unplanned, unprepared and unstructured meetings on a team that would rather be building product than talking about building product.

Tuesday, August 18, 2009

Code Kata Resources (Updated)

I've got a ton of exercises in the Building Skills books. Specifically, my OO Design book is based on my own personal Code Kata exercises.

Plus there's the established Code Kata resources. The CodeKata page, Mark Needham's blog posting on code-kata, Rizky Farhan's Collection of Software Projects, jp.hamilton's Code Kata Resources. The Coding Dojo page (which suffers from showing no usable URL's -- what a mistake.)

Plus there are the random problem sites: Project Euler, Top Coder, UVa, SPOJ, Google CodeJam.

I've done a few (63) Project Euler problems (I got stumped by problem 69). Another 37 and I'd be at level 3.

The question isn't "where are the problems?" The question is "Are these good Code Kata problems?"

Monday, August 17, 2009

Building Skills Books Toolset (Update)

I wrote the first Building Skills books using Appleworks. It wasn't too bad to organize the styles around basic semantics of the subject area. It's an easy, productive writing environment. Except, of course, for internal cross-references, indexes, and tables of contents.

I converted to DocBook XML markup. The conversion was arduous, but well worth it. I got better semantic markup. I used the DocBook XSL tools to convert to HTML both as a single document, and a chunked presentation. It worked out pretty well.

Two things don't work out well. First, the FOP processing is shaky. The books are big, and rather complex, with a fair number of embedded fonts. I have been unable to get the embedded fonts to work correctly with FOP.

The second thing that doesn't work out well is the language-specific markup. DocBook is biased toward C. There aren't enough tags for Python markup (module and library tags are missing, for example) and the syntax-oriented statement, class and function markup is all over the map in DocBook.

Objectives

My goal is to have the books in four formats: XML, single HTML file, chunked HTML and PDF. Of these, the single HTML is the least appealing. The chunked HTML is a great carrier for Adsense ads. The PDF is what I should be selling.

I don't mind writing in XML. Using XMLMind XMLEditor is generally pretty nice. Running the XSL-based tool chain to convert to HTML, and chunked HTML is easy.

Currently, I'm using FireFox to create the PDF. It's quick, but dirty. I'm not sure how many of the print-formatting CSS options FireFox can handle, so I haven't really customized the CSS for printing. However, the FireFox PDF has properly embedded fonts and cross-reference links.

Choices

Apple's Pages does a lot. It's a very nice product. But I'm not sure that the PDF and Chunked HTML will work out all that well.

The DocBook tool chain has problems identified above. The PDF output doesn't work because it overwhelms FOP.

Currently, I use FireFox to create PDF's. I could dress up the CSS to make it look a little better.
An alternative is to use Pisa to transform the XHTML into PDF. I started using Flying Saucer on another project and the XHTML to PDF idea has some appeal. This requires debugging the print-media CSS, which doesn't seem too bad.

On the other hand, RST can have almost all the semantic richness of XML. I've decided to redo Building Skills in Programming entirely in Sphinx, using RST. This has the advantage of being Python-specific, making heavy use of pygments for syntax coloring.

Also, I could stick with XML and use a different tool-chain to go from DocBook XML LaTeX. The dblatex package may do this nicely.

Tradeoffs

If I switch to Sphinx, editing is much easier. The source is plain text.

The chunked HTML created by Sphinx is outstanding. It's far better than the DocBook HTML. It's much easier to customize than the DocBook XSL, allowing use of Adsense ads with relatively little work.

On the other hand, to produce PDF, I have to go through LaTeX. This means that I have to find a nice LaTeX to PDF tool.

Currently, Sphinx doesn't easily produce a single HTML file. There may be ways around this; perhaps by using an alternate `.. toctree::` directive. But this is also a low-priority requirement, so this may have to be dropped in favor of a better-looking PDF page.

LaTeX to PDF

A Google search for "mac os x latex to pdf" turns up some interesting results.

- http://www.math.toronto.edu/joel/tex/

- http://www.math.wisc.edu/~andrejko/resources/LaTex-on-Mac-OSX.html

This list of references makes it look appealing to start with TeXShop and seeing if the LaTeX output from Sphinx can be used to produce PDF.

The TeXLive distribution includes a basic pdfTeX utility that might emit a nice PDF from the Sphinx LaTeX output.

Additionally, there is iTeXMac, which may also convert my Sphinx LaTeX to PDF.

These, however, seem to be largely WYSIWYG editing. While editing LaTeX isn't too bad, I want to work from a single RST source.

Python Solutions

The "python latex to pdf" Google search turns up the following projects for doing LaTeX processing in Python. These look very nice. In particular, they get away from manual editing of LaTeX.

- plasTeX

- pdfTeX

Better Still

Finally, I located the following: http://jimmyg.org/blog/2009/sphinx-pdf-generation-with-latex.html. This makes it clear that Sphinx expects TeXLive. This leads me to MacTeX, which -- it appears -- is what Sphinx expects.

Sphinx generates a makefile to create PDF from the LaTeX. Hopefully, this is not highly Linux-specific and will use the TeXlive distribution on Mac OS X.

Bonus Feature

Switching to LaTeX may also give me a better way to handle the formulas in the exercise sections. Currently, I have to write them and save the images. I don't know how many different equation editors I've used.

Alternative RST to PDF

There's an rst2pdf tool that may make it possible to go from Sphinx RST directly to PDF. Hopefully, this honors all the Sphinx extensions.

Thursday, August 13, 2009

Code Dojo and OO Design -- OO Design Dojo

Code Dojo, to an extent, includes a fair amount of OO Design.

I've been pondering ways to help folks who clearly have no design skills at all. I've read their code. It's appalling.

Toward that end, I looked at some of the Code Kata links: the CodeKata page, Mark Needham's blog posting on code-kata, Rizky Farhan's Collection of Software Projects, jp.hamilton's Code Kata Resources.

They asked for code samples to act as best practices. I suggested to our sales folks that code samples and simple code "best practices" were completely inadequate. They need serious remedial skill-building in programming.

What started to percolate was organizing a periodic "code dojo" meeting to help them build skills without the onerous "teaching" (or worse, "lecturing") mode. Teaching OO design to working programmers is generally hard. Many programmers seem to have a starting point that isn't based on the requirements or any kind of rational design. It appears that many programmers start with a pretty random boilerplate program.

Teaching Java to COBOL Programmers

I remember struggling with COBOL programmers. Back in '02 (before Code Dojo existed), I had no real way to educate folks except a lot of one-on-one conversations. I tried to schedule code walkthroughs, but the project manager didn't like the idea, and cancelled them.

I was allowed a quick overview of J2EE concepts and how the web side of our application was going to be assembled, but that was it.

Even covering basic J2EE servlet concepts become a FAIL because the legacy web framework was a JSP hack-around. It didn't work well, couldn't easily be explained (or used). But it was entrenched, and therefore, had priority in everyone's mind.

No matter how many times I tried to review basic OO concepts, and some design approaches, there were problems.

Everyone wanted to start from "the top", with a "main program" that "simply read and wrote files." COBOL concepts. Java File I/O has a subtle complexity with lots of nested constructors. No one likes to see that as a beginner. Also, file parsing is -- in reality -- fairly hard, but COBOL provides a handy optimization via a fixed format record layout and lots of implicit conversions.

We're writing servlets that query a database. There was no mapping to the COBOL concepts everyone wanted to start with. A few lectures and presentations aren't helpful. Had I but known about Code Dojo, I would have suggested that. It might have worked.

The "Getting Started" Problem

Some Stack Overflow questions on design are really questions about "getting started". These cause me to wonder how to help people who are sure they know the language and syntax, but can't seem to get started writing anything useful "from scratch" (or de novo.)

I've heard from people have have UML class diagrams and still claim they don't know what to do next. They can't -- for some reason -- get started.

I think this is related. They have a limited, fixed set of programming templates. Learning a new language does not fit their limited set of templates. Perhaps Code Dojo could help these folks gain a new set of templates.

Wednesday, August 5, 2009

No Brown M&M's -- A Brilliant Compliance Test

My son's a musician, and one of the standard jokes boils down to the phrase "no brown M&M's". They use it as a catch-all phrase for people being fussy to a level that's senseless.

Then BoingBoing pointed me to the Van Halen Brown M&M story in snopes.com.

The brown M&M's was actually a compliance test. If you read the contract rider, and complied with all the terms and conditions, you'd filter the brown M&M's off the buffet.

If you were not prepared for the technical requirements for the Van Halen show, you'd book them, skip reading the rider, hope it went off well, and -- generally -- fail to filter the M&M's.

Statements of Work

We write a lot of Statements of Work (SOW's) with lots of "assumptions". A common assumption is that deliverables will be approved within three days. We often write it, suspecting that the client can't actually take action in three days.

But we never really know until we start work. Once we're there, we discover that it takes them a month to get ready to spend five days reviewing a document and wondering what to do. Now we're five weeks behind the original schedule, and the customer will blame us for "springing" the 3-day decision window on them.

If we had a "No Brown M&M's" assumption, perhaps we'd have an earlier indication that things weren't going to work out well.

Use Cases

I'm also wondering if there needs to be a "Brown M&M" use case. Something egregious, but small. Something that should lead to client confusion. If they approve the scope of work, including the Brown M&M use case, we know they didn't really read or review the scope of work.

Perhaps there should be a "Brown M&M" in the architecture. Perhaps an irrelevant component that we insist must be downloaded and installed in the development environment. We simply check that it's there. If not -- well -- what else will be wrong?

Tuesday, August 4, 2009

The E. W. Dijkstra Archive (Update)

The E. W. Dijkstra Archive is a collection of over 1,000 manuscripts that EWD sent around during his career.

This Stack Overflow question ("explaining software development to management") had a really brilliant comment on one of the answers.

Analogies are always leaky, and you will end up with proposed solutions that solve the analogy, not your actual problem. Just explain the problem in simple terms without comparing it to anything physical. Read Edsger Dijkstra's famous 1036 and 854 for an insight into the horrors thinkign by analogy is inflicting upon us

Follow-up Reading

EWD 854, "The fruits of misunderstanding". "when faced with something new and unfamiliar we try to relate it to what we are familiar with". "a program is an abstract mechanism". "the mechanism being abstract, its production is subsumed in its design. In this respect a program is like a poem: you cannot write a poem without writing it. Yet people talk about programming as if it were a production process and measure 'programmer productivity' in terms of 'number of lines of code produced'."

Aha. Software production is subsumed in its design. There is no "production" other than design. We design at a high level. We design code. When we've designed the code, we're done. There is no further development effort.

EWD 1036, "On the cruelty of really teaching computing science". "From a bit to a few hundred megabytes, from a microsecond to a half an hour of computing confronts us with completely baffling ratio of 10⁹"

Love that cautionary note. Computer science forces us to confront layers of meaning that have a huge scope.

I'll have to work my way through the archive. It will probably take years to read through all the manuscripts.

Wait -- I've Got One Of Those

Back in the 70's, the undergrads at Syracuse University were given copies of EDW316. At the time, it was just a paper on the art of programming. It sat in a file drawer for decades. I unearthed it recently.

At the time, EDW316 looked like a course number. I assumed -- wrongly -- that it was notes from some other school of computer science. Turns out, it was one of those EDW missives. It had found it's way into the hands of the CS Faculty at SU. From there, a copy fell into my hands. Not until recently (30+ years later) did I realize exactly what it was a copy of.

It's densely-packed stuff. I think A Discipline of Programming is a little bit easier to work with. Further, Gries' The Science of Programming is easier still.

Monday, August 3, 2009

Wishful Thinking -- An Accident Waiting To Happen

Some assumptions are really hard to identify as "assumptions". Some assumptions are more "wishful thinking" than "assumption".

We process a lot of spreadsheets. As far as I'm concerned, the Spreadsheet User Interface (SUI) is a first-class part of any application. Users understand them, and you don't have to code as much.

We have a library that wraps XLRD, csv, zipfile and ElementTree XML parsing to read a wide variety of spreadsheet formats.

However, we were recently stabbed by an assumption. I had to spend over 40 hours restructuring our workbook library and application code.

Go With What You Know

The point of an Agile approach is to build high-value things first. In the olden days, we would have spent months (really) writing a sophisticated set of hypothetical use cases for the workbook library and then designing something that would cover all possible bases.

Rather than spend endless hours on the potential workbook features, I wrote what we needed to read the workbook files we actually had. We had a mixture of XLS, XLS in ZIP files, and CSV files. So we unified those with a fairly simple model of a "Row Source" that provided information on sheets, and provided the sequence of rows.

However, all those spreadsheets had a common feature. They were built by people with a strong IT background, people who -- even if they couldn't define "Normalization" -- knew what normalized data looked like. They provided everything as proper columns.

Recently, we got some data for a new customer pilot that was just enough different that it was a costly problem.

What Changed?

The change was the use of the sheet tab names to carry meaningful key information.

Every previous example either had sheets with names like "sheet1", "sheet2" and "sheet3", or the sheet name was something we could filter on.

This workbook had the time dimension coded in the sheet names, not a column of data on each sheet. Suddenly, the worksheet name was significant. And that's not all.

How Bad Can It Be?

The extensive breakage came from a bad design decision buried in the workbook library and all application layers that depend on the workbook libraries. Assuming that data was in columns -- instead of sheet names -- didn't create a big problem. Unwinding that assumption was an easy to fix.

What was bad was a design that permitted the various mappings to be independent of each other. The "operation" classes that stepped through rows were designed so that a simple list of independent mappings could be used to extract relevant columns from a row and process them. Each independent mapping created a Python object from columns.

It turns out that each mapping needed a context (with worksheet name). Also, it turns out that some mappings actually depend on other mappings.

When the mappings are picking up columns, having several mappings depending on a single column is easy. Having several mappings depend on the context, wasn't too bad. Having one mapping that parsed the sheet name, exposed our wishful thinking.

We needed to have mappings that depended on each other. When we map the sheet name to a Python object, we did parsing and database lookups. Other mappings now must be "aware" of this mapping so they don't redo the parsing and database lookups.

Lessons Learned

The trivial (and wrong) lesson learned could be "don't make so many assumptions". That's silly. We didn't casually make assumptions. We had example data; the sample data was biased and didn't show all conceivable permutations.

Another trivial (and wrong) lesson could be "document all your assumptions". That's silly, too. We did document them. That doesn't make the breakage significantly easier to fix.

The real lesson is to avoid wishful thinking . We'd tried too hard to make all of the mappings into independent objects. The phrase "shared nothing" is our mantra. While shared nothing gave us a very composable design, it wasn't actually correct.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.