S.Lott-Software Architect: March 2012

Tuesday, March 27, 2012

Patents vs. Innovation

Read "Why Software Patents are Evil" by Simon Phipps in InfoWorld.
It's an excellent summary of the problems caused by patents applied to software.

There's a great TED Talk by Johanna Blakley on "Lessons Learned from Fashion's Free Culture" which reinforces the essential point.

Software patents don't help anyone. The open source movement is evidence that folks working outside the constraints of patent lawyers are more innovative and produce high-quality software. The Internet is built on non-proprietary technology (TCP/IP and related protocols), GNU/Linux, Apache and similar software componentry. How have patents helped?

Thursday, March 22, 2012

Detailed Analysis of Disruptive Technology Change

Read this: Why I doubted Facebook could build a billion dollar business, and what I learned from being horribly wrong.

Don't be afraid to read it again.

when it comes to the exceptional cases, all bets are off. So keep your mind open to weird, young [ideas] that you meet that don’t fit the established pattern

Sound advice. The best ideas are disruptive. That means that the idea does not fit an established pattern.

The problem with being an architect is that software architecture is a political game.

In order to justify large projects with large funding, you must cater to the folks with money who (generally) feel that disruption == risk. The idea of incremental effort and proofs of concept may not fly because they've decided that inappropriate incumbent technology is magically quicker than appropriate but novel technology.

There's a profound Software Process Improvement issue here. Organizations can (and do) stifle innovation in an effort to "improve" their software development process. The false hope is that an unchanging technology base is somehow helpful at making people more effective.

Even if you give people second-rate tools, you can eventually get to be pretty good at using them. However. Using better tools might be better than trying to get really good at using poor tools.

What I find endlessly funny are folks who want "formal research" or "studies" that prove that some new idea is actually better than existing ideas. You can read Stack Overflow and programmers.stackexhcange.com questions looking for studies that prove the value of unit testing or prove the value of a NoSQL database or prove that software is simpler without triggers or stored procedures.

For the moment, these are disruptive ideas.

We know they're disruptive because people keep asking for proof.

When they stop asking for proof, you know the idea has finally "arrived" and it's time to move on to find the edge of the envelope again.

Tuesday, March 20, 2012

Innovation is Disruptive -- and sometimes forbidden

Saw this on Twitter from @hunterwalk:

Startups piss people off because their existence is a statement that incumbents aren't doing their job well enough

Also true of IT internal innovation. Pitch a novel, innovative idea to management, and most organizations will find ways to avoid it. Suggesting a bold new direction makes it look like someone isn't doing their job.

If you want to see real push-back, try suggesting that the incumbent technology platform needs to be replaced.

As an example, consider an all-singing-all-dancing all VB shop. The idea that C# might be better is met with a variety of responses.

It's too costly to change now. We can't afford the training or the licenses or something. The list is long and often includes silly costs based on a really bad adoption strategy. A bad adoption plan allows someone to defend their incumbent technology.
It's too risky to change now. What risks? The list of risks is often surprising and frustrating. My favorite is the blanket "We don't know what we don't know" risk statement. That's designed to be a complete show-stopper because there's no evidence to counter it.
The new Visual Studio has features that make VB acceptable for development. It's so important to keep the legacy technology that excuses can be made and work-arounds applied to preserve it.

As another example, consider replacing a 30-year old COBOL system. As part of stalling an innovative plan, I've been told that the only scalable transaction-processing technology is COBOL-CICS-VSAM. This was about five years ago, when the incumbency of COBOL might have seemed doubtful. But to IT staff, the idea of Java was too innovative.

The other problem was the innovative idea of a phased implementation. Yes. Agile thinking can be seen as disruptive to project managers; it can appear that they don't add much value. The idea that we'd build "bridges" between legacy applications and new applications was so unpleasant that we had to spend a long time discussing the maintenance and support of throw-away code that existed just long enough to be sure that all the relevant COBOL had been rewritten.

Bridges between old and new were portrayed as costly and risky. These are the usual responses to a proposed new way of looking at the problem. And, of course, a phased implementation was inherently low-value. I've been told that a project was absolutely "all or nothing" and no piece had value separate from the complete scope.

Suggesting a change means that there's a problem, right? It means their 30-year track record of COBOL support is less than perfect. It means their ability to use VB is flawed in some way. The only reason for a change is because -- somehow -- they have failed.

Thursday, March 15, 2012

Document Database and Schema Design

As part of coming to grips with CouchDB (and a particularly odious graph-theory problem) I've been looking around for design guidelines, hints and tips.

This MongoDB Schema Design document is quite helpful. The Link vs. Embed section clarifies the essential tradeoff here. In SQL world, link is the only tool. In this document-database world of CouchDB and MongoDB (as well as XML schema design) we have a link vs. embed decision.

Here is a presentation on trees (a specific kind of graph) in a document database: Trees in MongoDB. It enumerates a number of alternatives that are part of this new, larger design space for databases.

I found this because it was referenced in the myNoSQL blog, which seems to be a collection of sometimes useful links.

A September 2011 DAMA-NY meeting included a presentation on NoSQL Data Stores. It's findable on Google if you search for "dama nisql data stores" [sic; it is misspelled]. However, it's hard to link to directly because of the way Google obscures the target of their search. What's important in this presentation is the slightly defensive posture it takes about data modeling. It seems to describe ways that relational database modelers can cling to relevance in spite of threats represented by "NoSQL" databases.

The Transit System Problem

For a particularly gnarly problem, look at the Google Transit Feed Specification.

Then, look at Hampton Roads Transit on the GTFS Data Exchange.

How do we build a CouchDB document-centric view of this highly-normalized graph?

Route-centric? Each route has multiple trips. Each trip has a sequence of stop times. Do we repeat the stop definition over and over again? Seems silly, so perhaps it's Route - Trip - Stop-Time as a single document with links to Stop definitions.

Stop-centric? Each stop has multiple stop-times, and each stop has a parent route (based on trips along a route.) While this allows us to have a Stop document with a list of stop times and a (generally) single Route definition, it's not too useful.

We generally use transit based on the routes, not based on a single stop. So we need to query the stops based on a Route as well as based on a Stop Time. We may be able to use the CouchDB map definitions to provide some of these alternative views of a stop (i.e., by stop time, by route).

Some No SQL Lessons

What's really important here is that NoSQL schema design is not precisely the same as RDBMS schema design. In the RDMS world, with a single, fixed schema, proper up-front design is life-or-death. A great deal of design hand-wringing is required to get the relational model correct. In a good organization, this design effort involves prototyping, modeling and experimentation. In a bad organization, this design effort follows trivialized rules of thumb without too many second thoughts.

On the other hand, the No SQL schema design is essentially the same as RDBMS schema design.

In the NoSQL world, we still have to do prototyping, modeling and experimentation. We still have the three-tier separation between conceptual, logical and physical. Unlike the relational database, however, these tiers are more closely aligned in a document-oriented database. The conceptual tier is usually very, very close to the logical tier document structure. The conceptual gaps are filled by map-reduce views. The physical tier is just the logical tier document structure with some description of the sharding policies.

We do have to be more circumspect about committing to a design. In SQL world, DDL is a formal commitment to a design. DDL changes lead to breakage; making the dependencies more clear. In NoSQL world, there isn't the same depth of commitment. A technical spike which looks promising can lead to a gradual path of progressive dependence on the model.

The breakage that comes from schema change is more manageable but can spin out of control. It's more manageable because we can design our application around optional, missing and variant definitions of a document. It can become less manageable if we introduce too layers of useless abstraction to handle schema evolution.

The discipline of an ORM-like mapping between documents and Python classes is somewhat helpful for keeping the design focused around documents that have first-class meaning in the problem space. For that reason, couchdbkit seems useful.

Tuesday, March 13, 2012

The Moderator Problem

As the #3 ranked contributor on http://programmers.stackexchange.com, I've provided my share of advice. 554 Answers to be factual about it.

The moderators, however, have decided that I'm no longer welcome. It was simply shocking to be firmly (but politely) shown the door.

The issue was Python. Specifically, the fact that Python uses whitespace instead of C-style {}'s or some other notation for an enclosed block of code. The question -- closed by the moderators -- asked about convincing a reluctant boss to use Python instead of PHP for web development. The question stated that the boss liked his curly braces.

My answer pointed out several things, two of which became issues.

Python doesn't use {}'s. That means that {}'s aren't essential. That means the boss's preference for {}'s is a silly personal preference. Generally, there's no way to convince someone to change their personal preference.

According to a moderator, Python not using {}'s does not make {}'s non-essential. Even though Python does not use them, they're still -- somehow -- essential. This means that the boss referenced in the question is not expressing a personal preference. My claim that {}'s are not essential is merely opinion, I'm being too aggressive in stating my opinions, and Python's syntax is not a sufficient factual basis for my claim that {}'s are not essential.

Wow.

Python doesn't use {}'s. But I'm flat-out wrong to claim that liking {}'s is a preference.

Second. And weirder. I claimed that people mess up punctuation frequently, but they very rarely indent incorrectly. I've spend hours looking at C code that was indented nicely but omitted a closing }. I've seen hundreds of Stack Overflow questions that amount to missing punctuation.

The hundreds of Stack Overflow questions where punctuation was messed up were deemed not factual.

Not factual? Denied four separate ways. Note that we're way beyond any emotional response here. I'm being told that my facts are not facts.

Denial One. The moderator stated that they have never messed up punctuation like {}'s. While this may be true, it doesn't make other people's problems fanciful.

Denial Two. Those users were "dumb" for messing up punctuation. While this may also be true, it doesn't make other people's problems a matter of my opinion.

Denial Three. There are more questions with proper punctuation than messed up punctuation. This, too, my be true, but doesn't magically make the other questions go away. They still seem to exist as stubborn irrefutable facts. People mess up punctuation. Perhaps they're dumb, but they mess up.

Denial Four. The moderator simply disputed the SO evidence. I was simply wrong to present it.

Wow again.

Other folks in the Programmers Chat said we we're not seeing "eye-to-eye". Wait, what?

A moderator says my responses about {}'s being non-essential and people messing up {}'s were not factual. I thought I provided facts. The moderator then simply refuted the facts saying that the facts were not facts. This is a purely emotional response preventing any rational presentation of evidence.

I'm not sure that's an "eye-to-eye" issue. That's more of a "I don't like you" issue.

["You're just being a drama queen." Okay. I was told that I had a "history" of being "aggressive". If my history is the basis for refuting the facts, that means this was merely personal. As in "you're not welcome."]

The Moderator Problem is that there's no recourse. I have the third-highest reputation, but that carries no actual weight. My answer was edited in a shockingly heavy-handed way. The question was closed as "Not Constructive" presumably because the boss's understanding of programming languages (e.g., {}'s are essential) is somehow "correct" and hardly worth responding to. And I was told that there's no reason to argue with the moderators, I should just "move on".

[I was also told to simply roll-back the edits and see what happens next. That's being equally heavy-handed; a wikipedia edit war wouldn't address the "I deny your facts" problem.]

The design of the stack exchange sites allows flagging questions for moderator attention. This is pure genius.

But there's no way to flag moderators for further attention. Perhaps a "revote this answer" kind of process where the up voters and down voters would be notified of a change to an answer. But that seems complex. And if the original voters didn't feel like revoting after a change, the results would be indeterminate.

It was suggested that I take the issue up on Programmer's Meta (http://meta.programmers.stackexchange.com/). I'm not sure what would happen. The question is closed. The heavy-handed edits and refutation of facts are invisible and therefore irrelevant. All that's left is a "don't see eye-to-eye" situation that no one needs to care about since the question was closed.

It's hard to get over such a blatant refutation of stubborn facts.

It's hard to get over a moderator simply refusing to moderate but instead taking the time to repeatedly refute simple facts.

Tuesday, March 6, 2012

couchdb on Mac OS X

I've started to work with couchdb.

I've blogged before about the problems of SQL schema in Escaping the Relational Schema Trap.

A SQL schema -- for many applications -- is too confining. It creates cost with relatively little value. Once upon a time (when disks where expensive and computers were slow) it was essential.

The funny part about using couchdb is the build process.

In the couchdb Wiki, they have a page for Mac OS X installation. The relevant part is the following line of shell script.

brew install couchdb

That's it?

Yes. If--and only if--you follow the directions.

If you don't follow the directions, however, it can take all day. Here are the steps.

Install Apple's Developer Tools. I have Mac OS X 10.7. XCode 4.3 for Lion.
Launch XTools and install the command-line utilities. This is important because it includes things like make.
Remove fink or MacPorts if you happened to have used them for anything. For fink, you'll need to clean it out of your ~/.profile or ~/.bash_profile and rename the /sw directory.
Install Homebrew. Use the one-line ruby script from the Installation page of the Homebrew wiki. This: /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)" I tried several wrong ways before doing it the right way.
Install couchdb using homebrew. It takes a while.

There are numerous things which may go wrong.

A missing lib/crt.1.10.6.o, for example. This is just an out-of-date Xcode. It took few hours of failed experiments to (a) realize it and (b) get the right one. There are several proposed solutions around the web. Most of them are clever, but ineffective. Just get the right Xcode.

A failure to build Erlang. This was just an improperly installed version of Homebrew. There were a lot of message. A lot. I messed around with a lot of things until I finally crashed brew doctor. I deleted and reinstalled Homebrew and everything built. First try.

Thursday, March 1, 2012

Civic Hacking

This weekend: the HRVA Civic Hackfest. http://guestlistapp.com/events/86160

Alt Daily coverage.

Some more references.

What's it all about? Code For America. Exploit the information we have to make civic improvements. Ask any journalist who wrestles with government data. There's transparency (i.e., lip service) and there's transparency. Publishing information as a PDF based on scans of paper documents doesn't really do much for folks who are exercising their civic duty to analyze and correlate government actions with social benefits.

Only you can make your government more responsive. Voting is one way. Civic Hacking is another.

Another way is doing good data wrangling to expose the depth of influence money has over politics. Look at OpenSecrets.org for an example.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.