Bio and Publications

Thursday, March 27, 2014

Preconceived Notions, Perceptual Narrowing, The Einstellung Effect

Read this http://en.wikipedia.org/wiki/Einstellung_effect

Great article in Scientific American on this.

I didn't realize that sometimes I do spend time trying to defeat the Einstellung effect. Not a lot of time. But some time.

When confronted with gnarly design problems, I have the same bad habits as many other programmers. I reach for algorithms or data structures that I'm familiar with, even if they're not optimal. Sometimes I'll use algorithms that are not even appropriate to the problem domain.

However.

In working on a book on Advanced Object-Oriented Python, I realized that one habit I have is -- perhaps -- actually helpful.  It's this.

I can -- if I'm careful -- enumerate the alternatives. It's challenging to exhaustively enumerate design choices. It seems to help to have a list of things that clearly aren't optimal or aren't workable or aren't elegant. After pruning away the bad ideas, sometimes a good idea remains.

I'm not often good at this. Sometimes I dive in early, make choices, learn from my failures, and am forced to refactor.

The "enumeration" isn't literally every possibility. Sometimes, it's the types of possibilities or the strategies involved. Sometimes it's the patterns that the possibilities fulfill.

Example 1. When looking at Python data structures, the ABC's of Sequence, Mapping and Set provide a big-picture way to identify places to look. Once we've narrowed the field of view, we can look at kinds of sequences of kinds of mappings. We can also look at the generator expression alternative to a sequence object.

Example 2. There are often three design strategies: inheritance, composition (or wrapping) and invent-from-scratch. It's sometimes helpful to actually put together a technical spike of a subclass, a wrapper class and the outline of a de novo class definition. Bad ideas usually surface quickly when actual code is involved.

I thought I was being fussy. Or I was just stalling to avoid starting to write bad code too early. Or I was wasting time obsessing over performance issues.

No. I was preventing Einstellung. Avoiding Perceptual Narrowing.

Avoiding "Calling a problems nails because I'm wielding the hammer."

The Relational Database as Hammer

I feel obligated to note that the relational database often becomes the hammer and all problems are then reduced to RDBMS/SQL nails. No matter what the problem is.

One of the most amazing of these problems was an inquiry about "the top n rows query". It was the DBA's sense that getting the "top n rows" using some selection and ordering criteria was a really standard problem that everyone had confronted. The problem was so common there just had to be a standard, widely-adopted high-performance solution.

When getting the top 100 rows out of 40,000, there will be performance issues. The filtering and sorting (and any joins) will take time and DB resources. My question was "why?"

The answer was appalling. The database was being used as a message queue. The top 100 rows out of 40,000 was being doing to pick the next few items out of the queue for processing. The non-top-100 rows were merely lower priority items in the queue.

Wouldn't a proper message queue have been cheaper and simpler?'

Apparently not. Einstellung had set in. They had data. They had a database. What more is there?

Thursday, March 20, 2014

Shiny New MacBook Pro

Wow. Just Wow. An almost seamless technology change. Almost.

The old MacBook Pro (dual core 4Gb RAM) was struggling to keep up. Struggling. It had been dropped once, so there was a ding in the corner. The trackpad "click" wasn't reliably clicking. It was shaky.

Nothing that couldn't be cured by a new Bluetooth keyboard and/or mouse. Awkward, but cheap.

Instead, I opted for a new quad-core 8Gb MacBook Pro.

Hence the Wow.

Here's how the upgrade worked.

I logged in once in the Apple Store to create an "Administrator" account. That's Not Me, but it allowed me to configure and register the machine.

Go Home.

1. Finish the last Time Machine backup of the old machine.
2. Move the Time Machine device to the new machine.
3. Use the Migration Assistant to recover everything from the old machine. There was 300+ Gb of stuff, so it took a few hours. Completely hands-off. Completely successful the first time.

Turn on WiFi (it's not always on for me, the story is complicated; it involves going to a coffee shop.)

Almost everything is perfectly normal and usable on the new machine.

1Password wanted me to login to the App Store to be sure the licenses were all up-to-snuff.

DropBox wanted me to login again to their server.

GPSNavX needs a license key. Their keys are delightfully short, but apparently encode a date or something and can't be reused easily.

Python3.3 was -- of course -- a non-starter. Not surprising, really, since it's not an "app" that can be moved neatly by Mac OS X Migration Assistant.

The Python download and install was painless. The ActiveState ActiveTcl is also important because I do use tinter and IDLE. The Python page is very explicit about the correct release of ActiveTcl for Mac OS X. And I still did it wrong the first time.
while the ActiveState web site refers to 8.5.15.0, the installer dmg link has been updated to download ActiveTcl 8.5.15.1.
Today's job, then, is to put setuptools (easy_install) and pip onto this Mac and begin the process of figuring out what's missing that I really use. I install a fair amount of stuff experimentally; stuff I don't really want or need.  And I always install it "for real" in Python's site-packages because I'm too lazy to simply download the Git repository and update the PYTHONPATH manually.

We're talking about docutils, Sphinx, Django, Jinja2, and SQLAlchemy. To get started. PyYAML and PIL are probably required, but I'll wait until I need them.

Thursday, March 13, 2014

The Visitor Design Pattern and Python

Epiphany.

In Python, with iterators, the Visitor design pattern is useless. And a strongly-ingrained habit. Which I'm trying to break.

Here's a common Visitor approach:

class Visitor:
    def __init__( self ): ...
    def visit( self, some_target_thing ): ...
    def all_done( self ): ...

v = Visitor()
for thing in some_iterator():
    v.visit(thing)
v.all_done()

If we refactor the for statement into the Visitor, then it's just a Command or something.

Here's the refactored Iterating Visitor:

class Command:
    def __init__( self ): ...
    def process_all( self, iterable ):
        for thing in iterable:
            self.visit( thing )
    def visit( self, thing ): ...
    def all_done( self ): ...

c=Command()
c.process_all( some_iterator() )
c.all_done()

Possible Objection

The one possible objection is this: "What if our data structure is so hellishly complex that we can't reduce it to a simple iterator?"

That's perfectly silly. Any hyper-complex algorithm to walk any hyper-complex data structure, no matter how hyper complex, can always be recast into a generator function which uses yield to iterate over the objects.

Better Design

Once we start down this road, we can generally simplify processing into a kind of Command that looks something like this.


class Command:
    def __init__( self ): ...
    def run( self ): 
        for thing in self.iterable:
            ....

c= Command()
c.iterable= some_iterator()
c.run()


I find that this interface is somewhat easier to deal with when composing large commands from individual small commands. It follows a Create-Configure-Run pattern that seems to work out well. I just wish I would start with this rather than start with a Visitor, refactor, and end up with this.

Thursday, March 6, 2014

Enterprise JavaScript -- Not the best idea

See this:

The article lists reasons why Enterprise JavaScript is a recipe for disaster. "Finally, there's legacy integration..." This is the point.

In particular, JavaScript needs to get the data from somewhere: a backend process. If we push business knowledge into the front-end, even if we're assiduous about code libraries and sharing, we still have to fight with the "Out-Of-Date JS Library" issue. Server-side business knowledge is inherently consistent and sharable.

The big reason JavaScript feels good is because it's seems productive. Java is complex. C++, C#, and Objective C are Very Complex.

And.

Backend programming doesn't allow you to see finished-looking stuff right away. When you're fooling around with JavaScript you feel like you're doing real work. You're moving data around on the HTML page, that's productivity, right?

A spreadsheet is just as productive as JavaScript presentation.  Almost exactly as productive. The underlying data and processing still originates somewhere else. That's where the real value lies. In the data. In the backend.