Thursday, December 30, 2010

pyWeb Literate Programming Tool | Download pyWeb Literate Programming Tool software for free at SourceForge.net

I've (finally) updated the pyWeb Literate Programming Tool.

There were feature requests and bug reports. Much to do. Sadly, I'm really slow at doing it.

Top Language Skills

Check out this item on eWeek: Java, C, C++: Top Programming Languages for 2011 - Application Development - News & Reviews - eWeek.com.

The presentation starts with Java, C, C++, C# -- not surprising. These are clearly the most popular programming languages. These seem to be the first choice made by many organizations.
In some cases, it's also the last choice. Many places are simply "All C#" or "All Java" without any further thought. This parallels the "All COBOL" mentality that was so pervasive when I started my career. The "All Singing-All Dancing-All One Language" folks find the most shattering disruptions when their business is eclipsed by competitors with language and platform as a technical edge.

The next tier of languages starts with JavaScript, which is expected. Just about every web site in common use has some JavaScript somewhere. Browsers being what they are, there's really no viable alternative.

Weirdly, Perl is 6th. I say weirdly because the TIOBE Programming Community Index puts Perl much further down the popularity list.

PHP is next. Not surprising.

Visual Basic weighs in above Python. Being above Python is less weird than seeing Perl in 6th place. This position is closer to the TIOBE index. It is distressing to think that VB is still so wildly popular. I'm not sure what VB's strong suit is. C# seems to have every possible advantage over VB. Yet, there it is.

Python and Ruby are the next two. Again, this is more-or-less in the order I expected to see them. This is is the second tier of languages: really popular, but not in the same league as Java or one of the innumerable C variants.

After this, they list Objective-C as number 11. This language is tied to Apple's iOS and MacOS platforms, so it's popularity (like C# and VB) is driven in part by platform popularity.

Third Tier

Once we get past the top 10 Java/C/C++/C#/Objective C and PHP/Python/Perl/Ruby/Javascript tier, we get into a third realm of languages that are less popular, but still garnering a large community of users.

ActionScript. A little bit surprising. But -- really -- it fills the same client-side niche as JavaScript, so this makes sense. Further, almost all ActionScript-powered pages will also have a little bit of JavaScript to help launch things smoothly.

Now we're into interesting -- "perhaps I should learn this next" -- languages: Groovy, Go, Scala, Erlang, Clojure and F#. Notable by their absence are Haskell, Lua and Lisp. These seem like languages to learn in order to grab the good ideas that make them both popular and distinctive from Java or Python.

Tuesday, December 28, 2010

Amazing Speedup

A library had unit tests that ran for almost 600 seconds. Two small changes dropped the run time to 26 seconds.

I was amazed.

Step 1. I turned on the cProfile. I added two methods to the slowest unit test module.

def profile():
    import cProfile
    cProfile.run( 'main()', 'the_slow_module.prof' )
    report()

def report():
    import pstats
    p = pstats.Stats( 'the_slow_module.prof' )
    p.sort_stats('time').print_callees(24)
Now I can add profiling or simply review the report. Looking at the "callees" provided some hints as to why a particular method was so slow.

Step 2. I replaced ElementTree with cElementTree (duh.) Everyone should know this. I didn't realize how much this mattered. The trick is to note how much time was spent doing XML parsing. In the case of this unit test suite, it was a LOT of time. In the case of the overall application that uses this library, that won't be true.

Step 3. The slowest method was assembling a list. It did a lot of list.append(), and list.__len__(). It looked approximately like the following.


def something( self ):
result= []
for index, value in some_source:
    while len(result)+1 != index:
        result.append( None )
    result.append( SomeClass( value ) )
return result

This is easily replaced by a generator. The API changes, so every use of this method function may need to be modified to use the generator instead of the list object.


def something_iter( self ):
 counter= 0
 for index, value in some_source:
     while counter+1 != index:
         yield None
         counter += 1
     yield SomeClass( value )
     counter += 1

The generator was significantly faster than list assembly.

Two minor code changes and a significant speed-up.

Thursday, December 23, 2010

The Anti-IF Campaign

Check this out: http://www.antiifcampaign.com/.

I'm totally in favor of reducing complexity. I've seen too many places where a Strategy or some other kind of Delegation design pattern should have been used. Instead a cluster of if-statements was used. Sometimes these if-statements suffer copy-and-paste repetition because someone didn't recognize the design pattern.

What's important is the the if statement -- in general -- isn't the issue. The anti-if folks are simply demanding that folks don't use if as a stand-in for proper polymorphism.

Related Issues

Related to abuse of the if statement is abuse of the else clause.

My pet-peeve is code like this.
if condition1:
work
elif condition2:
work
elif condition3:
work
else:
what condition applies here?
When the various conditions share common variables it can be very difficult to deduce the condition that applies for the else clause.

My suggestion is to Avoid Else.

Write it like this.
if condition1:
work
elif condition2:
work
elif condition3:
work
elif not (condition1 or condition2 or condition3)
work
else:
raise AssertionError( "Oops. Design Error. Sorry" )

Then you'll know when you've screwed up.

[Update]

Using an assert coupled with an else clause is a kind of code-golf optimization that doesn't seem to help much. An elif will have the same conditional expression as the assert would have. But the comment did lead to rewriting this to use AssertionError instead of vague, generic Exception.

Tuesday, December 14, 2010

Code Base Fragmentation -- Again

Check this out: "Stupid Template Languages".

Love this: "The biggest annoyance I have with smart template languages (Mako, Genshi, Jinja2, PHP, Perl, ColdFusion, etc) is that you have the capability to mix core business logic with your end views, hence violating the rules of Model-View-Controller architecture."

Yes, too much power in the template leads to code base fragmentation: critical information is not in the applications, but is pushed into the presentation. This also happens with stored procedures and triggers.

I love the questions on Stack Overflow (like this one) asking how to do something super-sophisticated in the Django Template language. And the answer is often "Don't. That's what view functions are for."

Thursday, December 9, 2010

The Wrapper vs. Library vs. Aspect Problem

Imagine that we've got a collection of applications used by customers to provide data, a collection of applications we use to collect data from vendors. We've got a third collection of analytical tools.

Currently, they share a common database, but the focus, use cases, and interfaces are different.

Okay so far? Three closely-related groups or families of applications.

We need to introduce a new cross-cutting capability. Let's imagine that it's something central like using celery to manage long-running batch jobs. Clearly, we don't want to just hack celery features into all three families of applications. Do we?

Choices

It appears that we have three choices.
  1. A "wrapper" application that unifies all the application families and provides a new central application. Responsibilities shift to the new application.
  2. A site-specific library that layers some common features so that our various families of applications can be more consistent. This involves less of a responsibility shift.
  3. An "aspect" via Aspect-Oriented programming techniques. Perhaps some additional decorators added to the various applications to make them use the new functionality in a consistent way.
Lessons Learned

Adding a new application to be an overall wrapper turned out to be a bad idea. After implementing it, it was difficult to extend. We had two dimensions of extension.
  1. The workflows in the "wrapper" application needed constant tweaking as the other applications evolved. Every time we wanted to add a step, we had to update the real application and also update the wrapper. Python has a lot of introspection, but these aren't technical changes, these are user-visible workflow changes.
  2. Introducing a new data types and file formats was painful. The responsibility for this is effectively split between the wrapper and the underlying applications. The wrapper merely serves to dilute the responsibilities.
Libraries/Aspects

It appears that new common features are almost always new aspects of existing applications.

What makes this realization painful is the process of retrofitting a supporting library into multiple, existing applications. It seems like a lot of cut-and-paste to add the new import statements, add the new decorators and lines of code. However, it's a pervasive change. The point is to add the common decorator in all the right places.

Trying to "finesse" a pervasive change by introducing a higher-level wrapper isn't a very good idea.

A pervasive change is simply a lot of changes and regression tests. Okay, I'm over it.

Tuesday, December 7, 2010

Intuition and Experience

First, read EWD800.

It has harsh things to say about relying on intuition in programming.

Stack Overflow is full of questions where someone takes their experience with one language and applies it incorrectly and inappropriately to another language.

I get email, periodically, also on this subject. I got one recently on the question of "cast", "coercion" and "conversion" which I found incomprehensible for a long time. I had to reread EWD800 to realize that someone was relying on some sort of vague intuition; it appears that they were desperate to map Java (or C++) concepts on Python.

Casting

In my Python 2.6 book, I use the word "cast" exactly twice. In the same paragraph. Here it is.
This also means the "casting" an object to match the declared type
of a variable isn't meaningful in Python. You don't use C++ or Java-style
casting.
I though that would be enough information to close the subject. I guess not. It appears that some folks have some intuition about type casting that they need to see reflected in other languages, no matter how inappropriate the concept is.

The email asked for a "a nice summary with a simple specific example to hit the point home."
It's quite hard to provide an example of something that doesn't exist. But, I guess, intuition provides a strong incentive to see things which aren't there. I'm not sure how to word it more strongly or clearly. I hate to devolve into blow-by-blow comparison between languages because there are concepts that don't map. I'll work on being more forceful on casting.

Coercion

The words coercion (and coerce) occur more often, since they're sensible Python concepts. After all, Python 2 has formal type coercion rules. See "Coercion Rules". I guess my summary ("Section 3.4.8 of the Python Language Reference covers this in more detail; along with the caveat that the Python 2 rules have gotten too complex.") wasn't detailed or explicit enough.

The relevant quote from the Language manual is this: "As the language has evolved, the coercion rules have become hard to document precisely; documenting what one version of one particular implementation does is undesirable. Instead, here are some informal guidelines regarding coercion. In Python 3.0, coercion will not be supported."

I guess I could provide examples of coercion. However, the fact that it is going to be expunged from the language seems to indicate that it isn't deeply relevant. It appears that some readers have an intuition about coercion that requires some kind of additional details. I guess I have to include the entire quote to dissuade people from relying on their intuition regarding coercion.

Further, the request for "a nice summary with a simple specific example to hit the point home" doesn't fit well with something that -- in the long run -- is going to be removed. Maybe I'm wrong, but omitting examples entirely seemed to hit the point home.

Conversion

Conversion gets it's own section, since it's sensible in a Python context. I kind of thought that a whole section on conversion would cement the concepts. Indeed, there are (IMO) too many examples of conversions in the conversion section. But I guess that showing all of the numeric conversions somehow wasn't enough. I have certainly failed at least one reader. However, I can't imagine what more could be helpful, since it is -- essentially -- an exhaustive enumeration of all conversions for all built-in numeric types.

What I'm guessing is that (a) there's some lurking intuition and (b) Python doesn't match that intuition. Hence the question -- in spite of exhaustively enumerating the conversions. I'm not sure what more can be done to make the concept clear.

It appears that all those examples weren't "nice", "simple" or "specific" enough. Okay. I'll work on that.

Thursday, December 2, 2010

More Open Source and More Agile News

ComputerWorld, November 22, 2010, has this: "Open Source Grows Up". The news of the weird is "It's clear that open-source software has moved beyond the zealotry phase." I wasn't aware this phase existed. I hope to see the project plan with "zealotry" in it.

The real news is "More than two-thirds (69%) of the respondents said they expect to increase their investments in open source." That's cool.

Be sure to read the sidebar "Many Enterprises Aren't Giving Back." There's still a lot of concern over intellectual property. I've seen a lot of corporate software -- it's not that good. Most companies that are wringing their hands over losing control of their trade secrets should really be wringing their hands because their in-house software won't measure up to open-source standards.

I like this other quote: 'Five years ago, the South Carolina government was "considering writing a policy to prohibit or at least 'control' open source".' I like the "Must Control Open Source" feeling that IT leadership has. Without this mysterious "control", the organization could be swamped by software it didn't write. How's that different from being swamped by software products that involve contracts and fees? And requires Patch Tuesday?

Agility

SD Times has two articles on Agile methods. Both on the front page of a print publication. That's how you know the technique has "arrived".

First, there's "VersionOne survey finds agile knowledge and use on the rise". My favorite quote: "Interestingly, management support, the ability to change organizational culture and general resistance to change, remained at the forefronts of participants’ minds when indicating barriers to further agile adoption." I like the management barriers. I like it when management tries to exert more 'control' over a process (like software creation) that's so poorly understood.

Here's the companion piece, "For agile success, leaders must let teams loose". This is all good advice. Particularly, this: '"It’s hard to not command and control, but leadership is not about managing work. It’s about creating a capable organization that can manage work," [Rick Simmons] added.'

If you're micro-managing, you're not building an organization. Excellent advice. However, tell that to the financial control crowd.

Budgets and "Control"

Finally, be sure to read this by Frank Hayes in ComputerWorld: "Big Projects, Done Small". Here are the relevant quotes: "The logical conclusion: We should break up all IT projects into sub-million-dollar pieces." "The political reality: Everybody wants multimillion-dollar behemoths." "...huge projects get big political support."

In short, Agile is the right thing to do until you're trying to get approval. Bottom line: use Agile methods. But for purposes of pandering to executives who want to see large numbers with lots of zeroes, it's often necessary to write giant project "plans" that you don't actually use.

Go ahead, write waterfall plans. Don't feel guilty or conflicted. Some folks won't catch up with Agility because they think "Control" is better. Pander to them. It's okay.