Thursday, May 22, 2014

Python Package Design, Refactoring and the Stingray Reader Project

We'll be digging into Mastering Object-Oriented Python. Chapter 17, specifically.

We'll also be looking at a big refactoring of the Stingray Schema-Based File Reader.

We can identify three species of packages.

One common design is a Simple Package. A directory with an empty file. This package name becomes a qualifier for the internal module names. The package is simply a namespace for modules. We’ll use the package with something like this:

import package.module

Another common design is the Module-Package. This is a package which appears to be a module.  It will have a larger and more sophisticated that is a effectively, a  module definition. There are two variations on this theme. Sometimes we'll use this during the early stages of development because we don't know if the package will get really big or stay small. If we start out with a package and all the code is in the, we can refactor down to a module.

The more common use for a module-package is to have the import objects or other modules from the package directory. Or, it can stand as a part of a larger design that includes the top-level module and the qualified sub-modules. We’ll use the package with something like this:

import package

or perhaps

from package import thing

The third common pattern is a package where the selects among alternative implementations. The os module is a good example of this. We’ll use the package with something like this:

import package

Knowing that it did something roughly like the following for us.

import package.implementation as package

Refactoring Module to Package

The Stingray angle on this is the need to add iWork '13 numbers to the collection of spreadsheets which it can parse. The iWork '13 format is unique.

Previously, all of the spreadsheets fell into three families:

iWork '13 uses Snappy compress and Protobuf Serialization. Without some documentation, the files would be incomprehensible.  Read this: See Brilliant. 

The previous releases of Stingray had a single, large module to handle a variety of workbook formats. Folding iWork '13 into this module would have been lunacy. It was already large to the point of being painful to understand.

The original module will be transparently turned into Module-Package. The API (import stingray.workbook or from stingray.workbook import SomeClass) will remain the same.


The implementation will involve a package with each workbook format as a separate module inside that package. At the top, the will include code like the following.

    from stingray.workbook.csv import CSV_Workbook
    from stingray.workbook.xls import XLS_Workbook
    from stingray.workbook.xlsx import XLSX_Workbook
    from stingray.workbook.ods import ODS_Workbook
    from stingray.workbook.numbers_09 import Numbers09_Workbook
    from stingray.workbook.numbers_13 import Numbers13_Workbook
    from stingray.workbook.fixed import Fixed_Workbook

This has the advantage of allowing us to include additional parsing cruft in each module that's not part of the exposed API in the workbook package.

The Mastering Object-Oriented Python book has more details on this kind of design.