Thursday, September 25, 2014

PyCrypto Experience

Let me start with a wow. PyCrypto is very nice.

Let me emphasize the add-ons that go with PyCrypto. These are as valuable as the package itself.

Here's the story. I was working with a Java-based AES encrypter that used the "PBKDF2WithHmacSHA1" key generator algorithm. This was part of a large, sophisticated web application framework that was awkward to unit test because we didn't have a handy client to encode traffic.

We could run a second web application server with some client-focused software on it. But that means tying up yet another developer laptop running a web server just to encode message traffic. Wouldn't it be nicer to have a little Python app that the testers could use to spew messages as needed?

Yes. It would be nice. But, that the heck is the PBKDF2WithHmacSHA1 algorithm?

The JDK says this "Constructs secret keys using the Password-Based Key Derivation Function function found in PKCS #5 v2.0." One can do a lot of reading when working with well-designed crypto algorithms.

After some reading, I eventually wound up here: https://www.dlitz.net/software/python-pbkdf2/ Perfect. A trustable implementation of a fairly complex hash to create a proper private key from a passphrase. An add-on to PyCrypto that saved me from attempting to implement this algorithm myself.

The final script, then, was one line of code to invoke the pbkdf2 with the right passphrase, salt, and parameters to generate a key. Then another line of code to use PyCrypto's AES implementation to encrypt the actual plaintext using starting values and the generated key.

Yep.  Two lines of working code. Layer in the two imports, a print(), and a bit more folderol because the the character-set issues and URL form encoding. We're still not up to anything more than a tiny script with a command-line interface. "encrypt.py this" solved the problem.

At first we were a little upset that the key generation was so slow. Then I read some more and learned that slow key generation is a feature. It makes probing with a dictionary of alternative pass phrases very expensive.

The best part?

PyCrypto worked the first time. The very first result matched the opaque Java implementation.

The issue I have with crypto is that it's so difficult to debug. If our Python-generated messages didn't match the Java-generated messages. Well. Um. What went wrong? Which of the various values weren't salted or padded or converted from Unicode to bytes or bytes to Unicode properly? And how can you tell? The Java web app was a black box because we can't -- easily -- instrument the code to see intermediate results.

In particular, the various values that go into PBKDF2WithHmacSHA1 were confusing to someone who's new to crypto. And private key encryption means that the key doesn't show up anywhere in the application logs: it's transient data that's computed, used and garbage collected. It would have been impossible for us to locate a problem with the key generator.

But PyCrypto and the add-on pbkdf2 did everything we wanted.

Thursday, September 4, 2014

API Testing: Quick, Dirty, and Automated

When writing RESTful API's, the process of testing can be simple or kind of hideous.

The Postman REST Client is pretty popular for testing an API. There are others, I'm sure, but I'm working with folks who like Postman.

Postman 10 has some automation capabilities. Some.

However. (And this is important.)

It doesn't provide much help in framing up a valid complex JSON message.

When dealing with larger and more complex API's with larger and more complex nested and repeating structures, considerably more help is required to frame up a valid request and do some rational evaluation of the response.

Enter Python, httplib and json. While Python3 is universally better, these libraries haven't changed much since Python2, so either version will work.

The idea is simple.
  1. Create templates for the eventual class definitions in Python. This can make it easy to build the JSON structures. It can save a lot of hoping that the JSON content is right. It can save time in "exploratory" testing when the JSON structures are wrong. 
  2. Build complex messages using the template class definitions.
  3. Send the message with httplib. Read the response.
  4. Evaluate the responses with a simple script.
Some test scripting is possible in Postman. Some. In Python, you've got a complete programming language. The "some" qualifier evaporates.

When it comes to things like seeding database data, Python (via appropriate database drivers) can seed integration test databases, also.

Further, you can use the Python unittest framework to write elegant automated script libraries and run the entire thing from the command line in a simple, repeatable way.

What's important is that the template class definitions aren't working code. They won't evolve into working code. They're placeholders so that we can work out API concepts quickly and develop relatively complete and accurate pictures of what the RESTful interface will look like.

I had to dig out my copy of https://www.packtpub.com/application-development/mastering-object-oriented-python to work out the metaclass trickery required.

The Model and Meta-Model Classes

The essential ingredient is a model class what we can use to build objects. The objective is not a complete model of anything. The objective is just enough model to build a complex object.
Our use case looks like this.


>>> class P(Model):
...    attr1= String()
...    attr2= Array()
...
>>> class Q(Model):
...    attr3= String()
...
>>> example= P( attr1="this", attr2=[Q(attr3="that")] )

Our goal is to trivially build more complex JSON documents for use in API testing.  Clearly, the class definitions are too skinny to have much real meaning. They're handy ways to define a data structure that provides a minimal level of validation and the possibility of providing default values.

Given this goal, we need a model class and descriptor definitions. In addition to the model class, we'll also need a metaclass that will help build the required objects. One feature that we really like is keeping the class-level attributes in order. Something Python doesn't to automatically. But something we can finesse through a metaclass and a class-level sequence number in the descriptors.

Here's the metaclass to cleanup the class __dict__. This is the Python2.7 version because that's what we're using.


class Meta(type):
    """Metaclass to set the ``name`` attribute of each Attr instance and provide
    the ``_attr_order`` sequence that defines the origiunal order.
    """
    def __new__( cls, name, bases, dict ):
        attr_list = sorted( (a_name
            for a_name in dict
            if isinstance(dict[a_name], Attr)), key=lambda x:dict[x].seq )
        for a_name in attr_list:
            setattr( dict[a_name], 'name', a_name )
        dict['_attr_order']= attr_list
        return super(Meta, cls).__new__( cls, name, bases, dict )

class Model(object):
    """Superclass for all model class definitions;
    includes the metaclass to tweak subclass definitions.
    This also provides a ``to_dict()`` method used for
    JSON encoding of the defined attributes.

    The __init__() method validates each keyword argument to
    assure that they match the defined attributes only.
    """
    __metaclass__= Meta
    def __init__( self, **kw ):
        for name, value in kw.items():
            if name not in self._attr_order:
                raise AttributeError( "{0} unknown".format(name) )
            setattr( self, name, value )
    def to_dict( self ):
        od= OrderedDict()
        for name in self._attr_order:
            od[name]= getattr(self, name)
        return od

The __new__() method assures that we have an additional _attr_order attribute added to each class definition. The __init__() method allows us to build an instance of a class with keyword parameters that have a minimal sanity check imposed on them. The to_dict() method is used to convert the object prior to making a JSON representation.

Here is the superclass definition of an Attribute. We'll extend this with other attribute specializations.


class Attr(object):
    """A superclass for Attributes; supports a minimal
    feature set. Attribute ordering is maintained via
    a class-level counter.

    Attribute names are bound later via a metaclass
    process that provides names for each attribute.

    Attributes can have a default value if they are
    omitted.
    """
    attr_seq= 0
    default= None
    def __init__( self, *args ):
        self.seq= Attr.attr_seq
        Attr.attr_seq += 1
        self.name= None # Will be assigned by metaclass ``Meta``
    def __get__( self, instance, something ):
        return instance.__dict__.get(self.name, self.default)
    def __set__( self, instance, value ):
        instance.__dict__[self.name]= value
    def __delete__( self, *args ):
        pass

We've done the minimum to implement a data descriptor.  We've also included a class-level sequence number which assures that descriptors can be put into order inside a class definition.

We can then extend this superclass to provide different kinds of attributes. There are a few types which can help us formulate messages properly.


class String(Attr):
    default= ""

class Array(Attr):
    default= []

class Number(Attr):
    default= None

The final ingredient is a JSON encoder that can handle these class definitions.  The idea is that we're not asking for much from our encoder. Just a smooth way to transform these classes into the required dict objects.


class ModelEncoder(json.JSONEncoder):
    """Extend the JSON Encoder to support our Model/Attr
    structure.
    """
    def default( self, obj ):
        if isinstance(obj,Model):
            return obj.to_dict()
        return super(NamespaceEncoder,self).default(obj)

encoder= ModelEncoder(indent=2)


The Test Cases

Here is an all-important unit test case. This shows how we can define very simple classes and create an object from those class definitions.


>>> class P(Model):
...    attr1= String()
...    attr2= Array()
...
>>> class Q(Model):
...    attr3= String()
...
>>> example= P( attr1="this", attr2=[Q(attr3="that")] )
>>> print( encoder.encode( example ) )
{
  "attr1": "this", 
  "attr2": [
    {
      "attr3": "that"
    }
  ]
}


Given two simple class structures, we can get a JSON message which we can use for unit testing. We can use httplib to send this to the server and examine the results.