Thursday, April 17, 2014

Stingray 4.3 Update

See https://sourceforge.net/projects/stingrayreader/

  • Some small improvements to the COBOL DDE parsing.
  • A sensible demo program that shows how to read COBOL files.
  • A complete rewrite to Python3.3.
  • Support for more COBOL syntax.
  • Support for Occurs Depending On
  • Support for RECFM=F, RECFM=V and RECFM=VB legacy files.

The support for Occurs Depending On is a Big Sweaty Deal (BSD™). It breaks the essential structure for calculating offset and size of data items in a fixed file schema. It breaks it badly. We wind up with a fairly complex recursive calculation in the general case of variably located items.

We'll address ODS and Numbers spreadsheets with a somewhat cleaner implementation, also. I figured out how ElementTree QNames work. I regret the ignorant misuse of namespaces in previously posted code. This will be part of release 4.4 or later.

Thursday, April 10, 2014

The SortedContainers Package for Python

See this: SortedContainers — sortedcontainers 0.6.0 documentation

Here's some text from the invitation.
You may find the the performance comparison and implementation details interesting because it doesn't use any sophisticated tree data structure or balancing algorithms. It's a great example of taking advantage of what processors are good at rather than what theory says should be fast.
The documentation is extensive. The implementation details are interesting. The claim of faster is supported nicely. I have two quibbles.

  1. It actually does use a sophisticated tree data structure. A list of lists really is a kind of tree.
  2. "rather than what theory says should be fast" doesn't make any sense to me at all. 
A claim that Computer Science theory isn't right bothers me. If theory says some algorithm is fast, there are only two possibilities: (1) theory is actually right and it really is fast and the demonstration was incomplete or (2) the theory is incomplete, and the implementation extends (or replaces) the old theory; the implementation is new theory.

It's never the case that theory is "wrong." That fails to understand the role of theory.

It's always the case that an implementation either confirms theory or extends theory with new results.

To me, this package demonstrates one of two things.
  1. The theory was incomplete and this package is a new theory that replaces the old, wrong theory.
  2. The theory was right and this package demonstrates that the theory was right by being a good, solid, usable implementation. 
I would suggest the second option here: this package shows the value of Python's list-of-lists as a high-performance technique for implementing sorted structures. It's not an example of "taking advantage of what processors are good at." This is an example of using Python properly to squeeze excellent performance out of the available structures.

The really important insight is this "The sorted container types are implemented based on a single observation: bisect.insort is fast, really fast."

This is a profound observation.  Read more here: http://www.grantjenks.com/docs/sortedcontainers/implementation.html

Thursday, April 3, 2014

Mastering Object-Oriented Python

See http://www.packtpub.com/mastering-object-oriented-python/book

Coming soon.

This is relatively deep, under-the-hood stuff for folks who want to master the Python feature set.

Here's the overview of what you get:

  • 0 Some Preliminaries 3 examples, 56 lines
  • 1 The __init__() Method 55 examples, 351 lines
  • 2 Integrating Seamlessly with Python: Basic Special Methods 92 examples, 558 lines
  • 3 Attribute Access, Properties, and Descriptors 33 examples, 310 lines
  • 4 The ABC's of Consistent Design 18 examples, 108 lines
  • 5 Using Callables and Contexts 17 examples, 214 lines
  • 6 Creating Containers and Collections 50 examples, 438 lines
  • 7 Creating Numbers 12 examples, 232 lines
  • 8 Decorators And Mixins – Cross Cutting Aspects 39 examples, 233 lines
  • 9 Serializing and Saving: JSON, YAML, Pickle, CSV and XML 77 examples, 648 lines
  • 10 Storing and Retrieving Objects via shelve 34 examples, 272 lines
  • 11 Storing and Retrieving Objects via SQLite 45 examples, 410 lines
  • 12 Transmitting and Sharing Objects 38 examples, 388 lines
  • 13 Configuration Files and Persistence  59 examples, 490 lines
  • 14 The Logging and Warning Modules 46 examples, 343 lines
  • 15 Designing for Testability 38 examples, 393 lines
  • 16 Coping With The Command Line  42 examples, 222 lines
  • 17 Module and Package Design  31 examples, 93 lines
  • 18 Quality and Documentation  42 examples, 269 lines
  • Preface 3 examples, 12 lines
  • Bonus Chapter 1 Archives and Directories  11 examples, 119 lines
  • Bonus Chapter 2 Case Study: Document Analysis  39 examples, 308 lines

824 examples, 6467 lines

Yes. That's a lot of code. It's relentless.

Thursday, March 27, 2014

Preconceived Notions, Perceptual Narrowing, The Einstellung Effect

Read this http://en.wikipedia.org/wiki/Einstellung_effect

Great article in Scientific American on this.

I didn't realize that sometimes I do spend time trying to defeat the Einstellung effect. Not a lot of time. But some time.

When confronted with gnarly design problems, I have the same bad habits as many other programmers. I reach for algorithms or data structures that I'm familiar with, even if they're not optimal. Sometimes I'll use algorithms that are not even appropriate to the problem domain.

However.

In working on a book on Advanced Object-Oriented Python, I realized that one habit I have is -- perhaps -- actually helpful.  It's this.

I can -- if I'm careful -- enumerate the alternatives. It's challenging to exhaustively enumerate design choices. It seems to help to have a list of things that clearly aren't optimal or aren't workable or aren't elegant. After pruning away the bad ideas, sometimes a good idea remains.

I'm not often good at this. Sometimes I dive in early, make choices, learn from my failures, and am forced to refactor.

The "enumeration" isn't literally every possibility. Sometimes, it's the types of possibilities or the strategies involved. Sometimes it's the patterns that the possibilities fulfill.

Example 1. When looking at Python data structures, the ABC's of Sequence, Mapping and Set provide a big-picture way to identify places to look. Once we've narrowed the field of view, we can look at kinds of sequences of kinds of mappings. We can also look at the generator expression alternative to a sequence object.

Example 2. There are often three design strategies: inheritance, composition (or wrapping) and invent-from-scratch. It's sometimes helpful to actually put together a technical spike of a subclass, a wrapper class and the outline of a de novo class definition. Bad ideas usually surface quickly when actual code is involved.

I thought I was being fussy. Or I was just stalling to avoid starting to write bad code too early. Or I was wasting time obsessing over performance issues.

No. I was preventing Einstellung. Avoiding Perceptual Narrowing.

Avoiding "Calling a problems nails because I'm wielding the hammer."

The Relational Database as Hammer

I feel obligated to note that the relational database often becomes the hammer and all problems are then reduced to RDBMS/SQL nails. No matter what the problem is.

One of the most amazing of these problems was an inquiry about "the top n rows query". It was the DBA's sense that getting the "top n rows" using some selection and ordering criteria was a really standard problem that everyone had confronted. The problem was so common there just had to be a standard, widely-adopted high-performance solution.

When getting the top 100 rows out of 40,000, there will be performance issues. The filtering and sorting (and any joins) will take time and DB resources. My question was "why?"

The answer was appalling. The database was being used as a message queue. The top 100 rows out of 40,000 was being doing to pick the next few items out of the queue for processing. The non-top-100 rows were merely lower priority items in the queue.

Wouldn't a proper message queue have been cheaper and simpler?'

Apparently not. Einstellung had set in. They had data. They had a database. What more is there?

Thursday, March 20, 2014

Shiny New MacBook Pro

Wow. Just Wow. An almost seamless technology change. Almost.

The old MacBook Pro (dual core 4Gb RAM) was struggling to keep up. Struggling. It had been dropped once, so there was a ding in the corner. The trackpad "click" wasn't reliably clicking. It was shaky.

Nothing that couldn't be cured by a new Bluetooth keyboard and/or mouse. Awkward, but cheap.

Instead, I opted for a new quad-core 8Gb MacBook Pro.

Hence the Wow.

Here's how the upgrade worked.

I logged in once in the Apple Store to create an "Administrator" account. That's Not Me, but it allowed me to configure and register the machine.

Go Home.

1. Finish the last Time Machine backup of the old machine.
2. Move the Time Machine device to the new machine.
3. Use the Migration Assistant to recover everything from the old machine. There was 300+ Gb of stuff, so it took a few hours. Completely hands-off. Completely successful the first time.

Turn on WiFi (it's not always on for me, the story is complicated; it involves going to a coffee shop.)

Almost everything is perfectly normal and usable on the new machine.

1Password wanted me to login to the App Store to be sure the licenses were all up-to-snuff.

DropBox wanted me to login again to their server.

GPSNavX needs a license key. Their keys are delightfully short, but apparently encode a date or something and can't be reused easily.

Python3.3 was -- of course -- a non-starter. Not surprising, really, since it's not an "app" that can be moved neatly by Mac OS X Migration Assistant.

The Python download and install was painless. The ActiveState ActiveTcl is also important because I do use tinter and IDLE. The Python page is very explicit about the correct release of ActiveTcl for Mac OS X. And I still did it wrong the first time.
while the ActiveState web site refers to 8.5.15.0, the installer dmg link has been updated to download ActiveTcl 8.5.15.1.
Today's job, then, is to put setuptools (easy_install) and pip onto this Mac and begin the process of figuring out what's missing that I really use. I install a fair amount of stuff experimentally; stuff I don't really want or need.  And I always install it "for real" in Python's site-packages because I'm too lazy to simply download the Git repository and update the PYTHONPATH manually.

We're talking about docutils, Sphinx, Django, Jinja2, and SQLAlchemy. To get started. PyYAML and PIL are probably required, but I'll wait until I need them.

Thursday, March 13, 2014

The Visitor Design Pattern and Python

Epiphany.

In Python, with iterators, the Visitor design pattern is useless. And a strongly-ingrained habit. Which I'm trying to break.

Here's a common Visitor approach:

class Visitor:
    def __init__( self ): ...
    def visit( self, some_target_thing ): ...
    def all_done( self ): ...

v = Visitor()
for thing in some_iterator():
    v.visit(thing)
v.all_done()

If we refactor the for statement into the Visitor, then it's just a Command or something.

Here's the refactored Iterating Visitor:

class Command:
    def __init__( self ): ...
    def process_all( self, iterable ):
        for thing in iterable:
            self.visit( thing )
    def visit( self, thing ): ...
    def all_done( self ): ...

c=Command()
c.process_all( some_iterator() )
c.all_done()

Possible Objection

The one possible objection is this: "What if our data structure is so hellishly complex that we can't reduce it to a simple iterator?"

That's perfectly silly. Any hyper-complex algorithm to walk any hyper-complex data structure, no matter how hyper complex, can always be recast into a generator function which uses yield to iterate over the objects.

Better Design

Once we start down this road, we can generally simplify processing into a kind of Command that looks something like this.


class Command:
    def __init__( self ): ...
    def run( self ): 
        for thing in self.iterable:
            ....

c= Command()
c.iterable= some_iterator()
c.run()


I find that this interface is somewhat easier to deal with when composing large commands from individual small commands. It follows a Create-Configure-Run pattern that seems to work out well. I just wish I would start with this rather than start with a Visitor, refactor, and end up with this.

Thursday, March 6, 2014

Enterprise JavaScript -- Not the best idea

See this:

The article lists reasons why Enterprise JavaScript is a recipe for disaster. "Finally, there's legacy integration..." This is the point.

In particular, JavaScript needs to get the data from somewhere: a backend process. If we push business knowledge into the front-end, even if we're assiduous about code libraries and sharing, we still have to fight with the "Out-Of-Date JS Library" issue. Server-side business knowledge is inherently consistent and sharable.

The big reason JavaScript feels good is because it's seems productive. Java is complex. C++, C#, and Objective C are Very Complex.

And.

Backend programming doesn't allow you to see finished-looking stuff right away. When you're fooling around with JavaScript you feel like you're doing real work. You're moving data around on the HTML page, that's productivity, right?

A spreadsheet is just as productive as JavaScript presentation.  Almost exactly as productive. The underlying data and processing still originates somewhere else. That's where the real value lies. In the data. In the backend.