Thursday, October 23, 2014

Currying and Partial Function Evaluation

Old. But still interesting.

Partial Function Application is not Currying

It seems like hair-splitting. However, the distinction between bound variables and curried functions does have some practical implications.

I'm looking closely at PyMonad and the built-in functools library.

I'm finding some benefits in understanding functional programming and how to apply functional design patterns in Python. I'm also seeing the important differences between compiled -- and optimized languages -- and Python's approach. I'm slowly coming to understand how a (simple) recursive design is flattened into a for loop as part of manual tail-recursion optimization.

The functional programming goodness is giving me first-class headaches when trying to apply the lessons learned to Java, however. I suppose I should look closely at and There are claims that it's dangerously inefficient. Also, the customer who insists on Java has a (very) limited set of allowed libraries; if this isn't on the list, then the whole concept is a non-starter.

Thursday, October 16, 2014

Using Bottle as a miniature demo server

Let's talk small.

When writing API's, it sometimes helps to have a small demo web site to show the API in a context that's easy to visualize. API's are sometimes abstract, and without an application to provide some context, it can be unclear why the path looks like that or why the JSON document has those fields.

I want to emphasize the "small" part of the small demo. A small page or two with some HTML forms and a submit button. Really small.

The actual customer-facing apps (mobile, mobile web, and full web site) are being built by lots of other people. Not us. They're big. We build the API's (there are a lot) that support the data structures that support the processing that supports the user experience.

Building fake mobile apps is right out. We're not going to lard on Android SDK or Xcode development environments to our already overburdened laptops. We build backend API's.

Building a fake mobile web or full web site is appealing. What makes it complex is the UX folks are building everything in Angular.js. If we want to properly implement a form, we would have to master Angular just to do a demo for the product owner.

No thanks. Still too far afield for API developers. We're focused on mongo and JSON and performance and scalability. Not Angular.js and the UX.

What we want to do is build a small web server which runs just a few pages plucked out of the UX demo code so that we can show how interactions with a web page put stuff in a database. And vice-versa: stuff in the database will show up on a web page.

"Really?" we get asked. Some folks look askance at us for wanting to put a small demo site together.

"Yes," we answer. "Our product owner has a big vision and we're breaking that into a bunch of little API's. It's not perfectly clear how we're building up to that vision."

It's not perfectly clear how some of this work. Folks outside the scrum team have distracting questions. We want to have a page or two where we can fill in a form and click submit and stuff happens. This is far easier to explain than showing them Postman or SoapUI and claiming that this will support some user stories.

And as we grow toward the epic, the workflow aspects of this will grow. The stuff that admin "A" does after user "U" has made an initial request. Or the stuff that internal user "I" does after external user "X" has done something. But really, it's just a few small web pages. Small.

Imagine the demo. On laptop #1, we'll show user "X". On laptop #2, we're running a Mongo shell to query what's in the db. On laptop #3 we're showing user "I". The focus is really the API's. And how the API's add up to an epic collection of stories.

Serving some HTML pages

Just to make it painful, we can't simply grab the demo web pages out of the UX team's SVN repository. Why not? First, it's an Angular app. We can't just grab some HTML and go. The demo pages are served via node.js with Bower, so it's not even clear (to us) what the complete deployment looks like.

So. We cheated. We took a screen shot. We trimmed the edges of the page as .PNG files. We wrote our own form and cobbled in enough CSS to make it look close. We're not here to fake out the UX. We just want to enter some data and have it tickle our API. (Indeed, we have a "Not The Real Experience" on some pages.)

Initially, some of the team members tried serving these small pages with WebLogic. Then Jetty. It's not bad. But it's Java. It takes forever to build and deploy something after a trivial change. There are a lot of moving parts even with Jetty, and not all of them are obvious.

Since we're building "enterprise" API's, we're deeply enmeshed with every feature of the Spring Framework. Our STS/Eclipse environments are fat with add-ons and features.

While the Spring Framework ideal is to allow a developer to focus on relevant details and have the irrelevant details handled automagically, the magic almost gets in the way. These are small applications that are little more than a few static pages with forms and a submit button. Spring can do it, of course. But we're often testing our the actual API's in a Jetty server (or two). If the demo site requires yet another instance of Jetty with yet another configuration, our ability to cope diminishes.

How can we get back to small?

Python and Bottle

Python has several web servers built-in. We can use http.server. We can use wsgiref. Both of these are almost OK for what we want to do.

We can do better with two small downloads: Bottle and Jinja2. With these, we can build simple HTML pages that show some data. We can build simple servers that collect form data, use http.client to make API requests, and write copious logging details. We can write little bottle apps that handle just GETs and POSTs simply.

This is suitably small.

We can share the module with the Bottle object and the HTML mock-up pages. We can fire up the app in an instant on anyone's laptop, no matter what else they're running. We can tweak the server to adjust the logging or the API request or the form.

We actually run the server from within Idle. Make a change and hit F5 to redeploy after a change. It's small. It's fast. And it doesn't involve the huge complexities associated with Java.

Bottle doesn't do much. But what little it does do is a pretty tidy fit with tiny little demonstrations of super-simple HTML interactions.

Thursday, October 9, 2014

Scipy.optimization.anneal Problems

Well, not really "problems" per se. More of a strange kind of whining than a solvable problem.

Here's the bottom line. Two real quotes. Unedited.

Me: "> There's a way to avoid the religious nature of the argument. "
Them: "Please suggest away."

Really. Confronted with choices between anneal and basin hopping, they could only resort to hand-waving and random utterances.

The tl;dr summary is this:
  • "scipy.optimize.anneal only has three hard-wired schedule variants: ‘fast’, ‘cauchy’ or ‘boltzmann’."
  • My initial response was "And..."? 
  • "Not being able to specify my own cooling schedule severely limits the usability of the code"
A complaint that causes me deep pain: "severely limits" with no actual evidence. And no plan to get evidence beyond a religious wars style argument.

There may have been a technical question on the class definitions inside scipy. But that question was overshadowed by the essential problems with what they were doing. Or, more properly, what they were whining about.

Did they really have a problem with a state of the art solution to optimization problems? More specifically:

1. Did they read the "Deprecated" part of the scipy documentation? This is a hint that there are better solutions available. Perhaps they could start there instead of whining.
2. Did they actually read the details of the three schedules in the "Notes" section? Do they seriously think they've got a new approach that does not fit any of the various parameters of the three installed algorithms? I don't mean to be too rude, but... Do they really think they're that scale of genius?
3. Do they have any evidence that their problem is so unlike the typical case handled by basin hopping?
4. Do they have any evidence that their solution totally crushes the already-built code?

I think the answers to all four question were "no". 

I'm not even certain that I could help them with some of the Python technology required to extend scipy. But, I'm sure I cannot actually do anything of value under the circumstances that (a) they have not really tried the established algorithms and (b) they're already sure that the established algorithms can't work based on religious-wars arguments.

It was clear that they never read the "Notes" section on this SciPy page:

One of the emails in the exchange had a kind of hand-waving justification for the problem domain being somehow unique. Lacking any actual evidence, I'm inclined to believe they were just hoping that their problem domain was unique, allowing them to dismiss the available Python solution and do something uniquely bad. 

(Optimization is not my area of expertise. Perhaps I'm way off base; perhaps the existing solutions are so problem-domain specific that everyone has to invent new technology. Maybe established solutions really don't work.)

More importantly: there was no actual evidence that the existing optimization (either annealing or basin hopping) failed to solve their problem.

But the worst part was this:

"From, a business perspective, I need to know about SA because our competitor stole our biggest client using it."

They don't actually want to innovate. They only want to try and catch up by making religious war arguments over the deprecated simulated annealing vs. basin hopping.

Thursday, October 2, 2014

Not sure what went wrong, but...

Read this:

Not sure what's going on here.

"Script I want to run" seemed clear.

The rest seemed like ill-advised trips down numerous ratholes. In particular, anything that involved Python Tools for Visual Studio seems like a waste of time and brain calories.

It's not clear at all what's not working. That's perhaps the most frustrating thing about this kind of post.

The final note, "Decisions decisions..." pointed out a simple confusion that befuddles the technically-minded. Too many details.

The decision between Python2 and 3 is trivial. There are a lot of details, but they're irrelevant for making the decision.

What package are you trying to use? It's hard to tell, but it looks like it's vispy. If so, that's all that matters. vispy works with Python3.3, requires numpy, and a "backend." Install just that and nothing more. In particular, avoid junk like Visual Studio.

The Dog's Breakfast seems to be the result of chasing down lots of details that aren't too relevant. It's hard to tell. But a scatter-shot post claiming "all this is broken" is a hint that the author wasn't simply following the vispy installation instructions. It could be that they turned something simple into a dog's breakfast by chasing irrelevant technologies all around the garden.

Thursday, September 25, 2014

PyCrypto Experience

Let me start with a wow. PyCrypto is very nice.

Let me emphasize the add-ons that go with PyCrypto. These are as valuable as the package itself.

Here's the story. I was working with a Java-based AES encrypter that used the "PBKDF2WithHmacSHA1" key generator algorithm. This was part of a large, sophisticated web application framework that was awkward to unit test because we didn't have a handy client to encode traffic.

We could run a second web application server with some client-focused software on it. But that means tying up yet another developer laptop running a web server just to encode message traffic. Wouldn't it be nicer to have a little Python app that the testers could use to spew messages as needed?

Yes. It would be nice. But, that the heck is the PBKDF2WithHmacSHA1 algorithm?

The JDK says this "Constructs secret keys using the Password-Based Key Derivation Function function found in PKCS #5 v2.0." One can do a lot of reading when working with well-designed crypto algorithms.

After some reading, I eventually wound up here: Perfect. A trustable implementation of a fairly complex hash to create a proper private key from a passphrase. An add-on to PyCrypto that saved me from attempting to implement this algorithm myself.

The final script, then, was one line of code to invoke the pbkdf2 with the right passphrase, salt, and parameters to generate a key. Then another line of code to use PyCrypto's AES implementation to encrypt the actual plaintext using starting values and the generated key.

Yep.  Two lines of working code. Layer in the two imports, a print(), and a bit more folderol because the the character-set issues and URL form encoding. We're still not up to anything more than a tiny script with a command-line interface. " this" solved the problem.

At first we were a little upset that the key generation was so slow. Then I read some more and learned that slow key generation is a feature. It makes probing with a dictionary of alternative pass phrases very expensive.

The best part?

PyCrypto worked the first time. The very first result matched the opaque Java implementation.

The issue I have with crypto is that it's so difficult to debug. If our Python-generated messages didn't match the Java-generated messages. Well. Um. What went wrong? Which of the various values weren't salted or padded or converted from Unicode to bytes or bytes to Unicode properly? And how can you tell? The Java web app was a black box because we can't -- easily -- instrument the code to see intermediate results.

In particular, the various values that go into PBKDF2WithHmacSHA1 were confusing to someone who's new to crypto. And private key encryption means that the key doesn't show up anywhere in the application logs: it's transient data that's computed, used and garbage collected. It would have been impossible for us to locate a problem with the key generator.

But PyCrypto and the add-on pbkdf2 did everything we wanted.

Thursday, September 4, 2014

API Testing: Quick, Dirty, and Automated

When writing RESTful API's, the process of testing can be simple or kind of hideous.

The Postman REST Client is pretty popular for testing an API. There are others, I'm sure, but I'm working with folks who like Postman.

Postman 10 has some automation capabilities. Some.

However. (And this is important.)

It doesn't provide much help in framing up a valid complex JSON message.

When dealing with larger and more complex API's with larger and more complex nested and repeating structures, considerably more help is required to frame up a valid request and do some rational evaluation of the response.

Enter Python, httplib and json. While Python3 is universally better, these libraries haven't changed much since Python2, so either version will work.

The idea is simple.
  1. Create templates for the eventual class definitions in Python. This can make it easy to build the JSON structures. It can save a lot of hoping that the JSON content is right. It can save time in "exploratory" testing when the JSON structures are wrong. 
  2. Build complex messages using the template class definitions.
  3. Send the message with httplib. Read the response.
  4. Evaluate the responses with a simple script.
Some test scripting is possible in Postman. Some. In Python, you've got a complete programming language. The "some" qualifier evaporates.

When it comes to things like seeding database data, Python (via appropriate database drivers) can seed integration test databases, also.

Further, you can use the Python unittest framework to write elegant automated script libraries and run the entire thing from the command line in a simple, repeatable way.

What's important is that the template class definitions aren't working code. They won't evolve into working code. They're placeholders so that we can work out API concepts quickly and develop relatively complete and accurate pictures of what the RESTful interface will look like.

I had to dig out my copy of to work out the metaclass trickery required.

The Model and Meta-Model Classes

The essential ingredient is a model class what we can use to build objects. The objective is not a complete model of anything. The objective is just enough model to build a complex object.
Our use case looks like this.

>>> class P(Model):
...    attr1= String()
...    attr2= Array()
>>> class Q(Model):
...    attr3= String()
>>> example= P( attr1="this", attr2=[Q(attr3="that")] )

Our goal is to trivially build more complex JSON documents for use in API testing.  Clearly, the class definitions are too skinny to have much real meaning. They're handy ways to define a data structure that provides a minimal level of validation and the possibility of providing default values.

Given this goal, we need a model class and descriptor definitions. In addition to the model class, we'll also need a metaclass that will help build the required objects. One feature that we really like is keeping the class-level attributes in order. Something Python doesn't to automatically. But something we can finesse through a metaclass and a class-level sequence number in the descriptors.

Here's the metaclass to cleanup the class __dict__. This is the Python2.7 version because that's what we're using.

class Meta(type):
    """Metaclass to set the ``name`` attribute of each Attr instance and provide
    the ``_attr_order`` sequence that defines the origiunal order.
    def __new__( cls, name, bases, dict ):
        attr_list = sorted( (a_name
            for a_name in dict
            if isinstance(dict[a_name], Attr)), key=lambda x:dict[x].seq )
        for a_name in attr_list:
            setattr( dict[a_name], 'name', a_name )
        dict['_attr_order']= attr_list
        return super(Meta, cls).__new__( cls, name, bases, dict )

class Model(object):
    """Superclass for all model class definitions;
    includes the metaclass to tweak subclass definitions.
    This also provides a ``to_dict()`` method used for
    JSON encoding of the defined attributes.

    The __init__() method validates each keyword argument to
    assure that they match the defined attributes only.
    __metaclass__= Meta
    def __init__( self, **kw ):
        for name, value in kw.items():
            if name not in self._attr_order:
                raise AttributeError( "{0} unknown".format(name) )
            setattr( self, name, value )
    def to_dict( self ):
        od= OrderedDict()
        for name in self._attr_order:
            od[name]= getattr(self, name)
        return od

The __new__() method assures that we have an additional _attr_order attribute added to each class definition. The __init__() method allows us to build an instance of a class with keyword parameters that have a minimal sanity check imposed on them. The to_dict() method is used to convert the object prior to making a JSON representation.

Here is the superclass definition of an Attribute. We'll extend this with other attribute specializations.

class Attr(object):
    """A superclass for Attributes; supports a minimal
    feature set. Attribute ordering is maintained via
    a class-level counter.

    Attribute names are bound later via a metaclass
    process that provides names for each attribute.

    Attributes can have a default value if they are
    attr_seq= 0
    default= None
    def __init__( self, *args ):
        self.seq= Attr.attr_seq
        Attr.attr_seq += 1 None # Will be assigned by metaclass ``Meta``
    def __get__( self, instance, something ):
        return instance.__dict__.get(, self.default)
    def __set__( self, instance, value ):
        instance.__dict__[]= value
    def __delete__( self, *args ):

We've done the minimum to implement a data descriptor.  We've also included a class-level sequence number which assures that descriptors can be put into order inside a class definition.

We can then extend this superclass to provide different kinds of attributes. There are a few types which can help us formulate messages properly.

class String(Attr):
    default= ""

class Array(Attr):
    default= []

class Number(Attr):
    default= None

The final ingredient is a JSON encoder that can handle these class definitions.  The idea is that we're not asking for much from our encoder. Just a smooth way to transform these classes into the required dict objects.

class ModelEncoder(json.JSONEncoder):
    """Extend the JSON Encoder to support our Model/Attr
    def default( self, obj ):
        if isinstance(obj,Model):
            return obj.to_dict()
        return super(NamespaceEncoder,self).default(obj)

encoder= ModelEncoder(indent=2)

The Test Cases

Here is an all-important unit test case. This shows how we can define very simple classes and create an object from those class definitions.

>>> class P(Model):
...    attr1= String()
...    attr2= Array()
>>> class Q(Model):
...    attr3= String()
>>> example= P( attr1="this", attr2=[Q(attr3="that")] )
>>> print( encoder.encode( example ) )
  "attr1": "this", 
  "attr2": [
      "attr3": "that"

Given two simple class structures, we can get a JSON message which we can use for unit testing. We can use httplib to send this to the server and examine the results.

Thursday, August 21, 2014

Permutations, Combinations and Frustrations

The issue of permutations and combinations is sometimes funny.

Not funny weird.  But funny "haha."

I received an email with 100's of words and 10 attachments. (10. Really.) The subject was how best to enumerate 6! permutations of something or other. With a goal of comparing some optimization algorithm with a brute force solution. (I don't know why. I didn't ask.)

Apparently, the programmer was not aware that permutation creation is a pretty standard algorithm with a standard solution. Most "real" programming languages have libraries which already solve this in a tidy, efficient, and well-documented way.

For example

I suspect that this is true for every language in common use.

In Python, this doesn't even really involve programming. It's a first-class expression you enter at the Python >>> prompt.

>>> import itertools
>>> list(itertools.permutations("ABC"))

[('A', 'B', 'C'), ('A', 'C', 'B'), ('B', 'A', 'C'), ('B', 'C', 'A'), ('C', 'A', 'B'), ('C', 'B', 'A')]

What's really important about this question was the obstinate inability of the programmer to realize that their problem had a tidy, well understood solution. And has had a good solution for decades. Instead they did a lot of programming and sent 100's of words and 10 attachments (10. Really.)

The best I could do was provide this link:

Steven Skiena, The Algorithm Design Manual

It appears that too few programmers are aware of how much already exists. They plunge ahead creating a godawful mess when a few minutes of reading would have provided a very nice answer.

Eventually, they sent me this:'s_algorithm

As a grudging acknowledgement that they had wasted hours failing to reinvent the wheel.