Bio and Publications

Tuesday, March 31, 2015

This is awkward

Years ago -- many years ago -- I self-published three books on Python.

I also set up a Google Group, Building Skills Books, for a discussion area.

For a while, I check my download logs carefully to see how the book was being received.

Then I became a tech nomad, and commenced traveling and working from coffee shops and cellular network hotspots with my phone. The discussion group went from a luxury to a complexity to a near impossibility.

Now that I have steady connectivity, it's really kind of embarrassingly awkward that I ignored my readers like that.

For years.

I submit my apology to all of the folks who hoped for a more responsive author.

What can I do about it?

Nothing to fix the past, obviously. For the moment, while working for Packt, most of my writing goes to them.

However.

I hope to revisit these Building Skills Books in some depth in the not-too-distant future. I see three stories on my "As a Reader, I want..." storyboard.

1) Typos Fixed so the books are readable. Gather all the typos and corrections from the discussions.

2) Clarifications so the books are more useful. Gather all the questions, comments, suggestions. Fold those into the rewrites.

3) Python3. The beginner-level Intro to Programming book needs further revision. (I tried to publish an iBook for the Python3 rewrite and am not happy with the process or the results.) I think I'll revise it (yet again) and post it as GitHub pages.

4) Python3. The Intro to Python book needs to be rewritten. It's a HUGE project, but, I feel that it still has some value because it's chock-full of exercises and attempts to be really complete. I think I'll drop the reference material, however. Back in '02 -- when I started the first draft -- that seemed like it was beneficial. Now it's approximately worthless.

The OO Design book is a real hand-wringer. The approach of a strict parallelism between Python and Java can be seen as a disservice to both languages. On the other hand, I think it's good to focus on lowest-common denominator features that are common to all OO languages. I'm undecided on what to do here. I think I'd like to drop Java and add some additional refactoring exercises.

I need to pick one of these two:

4a) Python Focus for OO Design.

4b) Lowest Common Denominator Focus for OO Design.

There's no precise schedule for this; it's mostly a kind of placeholder and discussion jumping-off point. Maybe I should start a proper https://trello.com board for this.

Tuesday, March 24, 2015

Configuration Files, Environment Variables, and Command-Line Options

We have three major tiers of configuration for applications. Within each tier, we have sub-tiers, larding on yet more complexity. The organization of the layers is a bit fungible, too. Making good choices can be rather complex because there are so many variations on the theme of "configuration". The desktop GUI app with a preferences file has very different requirements from larger, more complex applications.

The most dynamic configuration options are the command-line arguments. Within this tier of configuration, we have two sub-tiers of default values and user-provided overrides to those defaults. Where do the defaults come from? They might be wired in, but more often they come from environment variables or parameter files or both.

There's some difference of opinion on which tier is next in the tiers of dynamism. The two choices are configuration files and environment variables. We can consider environment variables as easier to edit than configuration files. In some cases, though, configuration files are easier to change than environment variables. Environment variables are typically bound to the process just once (like command-line arguments), where configuration files can be read and re-read as needed.

The environment variables have three sub-tiers. System-level environment variables tend to be fixed. The variables set by a .profile or .bashrc tend to be specific to a logged-in user, and are somewhat more flexible that system variables. The current set of environment variables associated with the logged-in session can be modified on the command line, and are as flexible as command-line arguments.

Note that we can do this in Linux:

PYTHONPATH=/path/to/project python3 -m some_app -opts

This will set an environment variable as part of running a command.

The configuration files may also have tiers. We might have a global configuration file in /etc/our-app. We might look for a ~/.our-app-rc as a user's generic configuration. We can also look for our-app.config in the current working directory as the final set of overrides to be used for the current invocation.

Some applications can be restarted, leading to re-reading the configuration files. We can change the configuration more easily than we can bind in new command-line arguments or environment variables.

Representation Issues

When we think about configuration files, we also have to consider the syntax we want to use to represent configurable parameters. We have five common choices.

Some folks are hopelessly in love with Windows-style .ini files. The configparser module will parse these. I call it hopelessly in love because the syntax is rather quite limited. Look at the logging.config module to see how complex the .ini file format is for non-trivial cases.

Some folks like Java-style properties files. These have the benefit of being really easy to parse in Python. Indeed, scanning a properties file is great exercise in functional-style Python programming.
I'm not completely sold on these, either, because they don't really handle the non-trivial cases well.

Using JSON or YAML for properties has some real advantages. There's a lot of sophistication available in these two notations. While JSON has first-class support, YAML requires an add-on module.

We can also use Python as the language for configuration. For good examples of this, look at the Django project settings file. Using Python has numerous advantages. The only possible disadvantage is the time wasted arguing with folks who call it a "security vulnerability."

Using Python as the configuration language is only considered a vulnerability by people who fail to realize that the Python source itself can be hacked. Why waste time injecting a bug into a configuration file? Why not just hack the source?

My Current Fave 

My current favorite way to handle configuration is by defining some kind of configuration class and using the class object throughout the application. Because of Python's import processing, a single instance of the class definition is easy to guarantee.

We might have a module that defines a hierarchy of configuration classes, each of which layers in additional details.

class Defaults:
    mongo_uri = "mongodb://localhost:27017" 
    some_param = "xyz" 

class Dev(Defaults):
    mongo_uri = "mongodb://sandbox:27017"

class QA(Defaults):
    mongo_uri = "mongodb://username:password@qa02:27017/?authMechanism=PLAIN&authSource=$external"

Yes. The password is visible. If we want to mess around with higher levels of secrecy in the configuration files, we can use PyCrypto and a key generator to use an encrypted password that's injected into the URI. That's a subject for another post. The folks to can edit the configuration files often know the passwords. Who are we trying to hide things from?

How do we choose the active configuration to use from among the available choices in this file? We have several ways.
  • Add a line to the configuration module. For example, Config=QA will name the selected environment. We have to change the configuration file as our code marches through environments from development to production. We can use from configuration import Config to get the proper configuration in all other modules of the application.
  • Rely on the environment variable to specify which configuration use. In enterprise contexts, an environment variable is often available.We can import os, and use Config=globals()[os.environ['OURAPP_ENVIRONMENT']] to pick a configuration based on an environment variable. 
  • In some places, we can rely on the host name itself to pick a configuration. We can use os.uname()[1] to get the name of the server. We can add a mapping from server name to configuration, and use this: Config=host_map(os.uname()[1],Defaults).
  • Use a command-line options like "--env=QA". This can a little more complex than the above techniques, but it seems to work out nicely in the long run.
Command-line args to select a specific configuration

To select a configuration using command-line arguments, we must decompose configuration into two parts. The configuration alternatives shown above are placed in a config_params.py module. The config.py module that's used directly by the application will import the config_params.py module, parse the command-line options, and finally pick a configuration. This module can create the required module global, Config. Since it will only execute once, we can import it freely.

The config module will use argparse to create an object named options with the command-line options. We can then do this little dance:

import argparse
import sys
import config_params

parser= argparse.ArgumentParser()
parser.add_argument("--env", default="DEV")
options= parser.parse_args()

Config = getattr(config_params, options.env)
Config.options= options

This seems to work out reasonably well. We can tweak the config_params.py flexibly. We can pick the configuration with a simple command-line option.

If we want to elegantly dump the configuration, we have a bit of a struggle. Each class in the hierarchy introduces names: it's a bit of work to walk down the __class__.__mro__ lattice to discover all of the available names and values that are inherited and overridden from the parents.

We could do something like this to flatten out the resulting values:

Base= getattr(config_params, options.env)
class Config(Base):
    def __repr__(self):
        names= {}
        for cls in reversed(self.__class__.__mro__):
            cls_names= dict((nm, (cls.__name__, val)) 
                for nm,val in cls.__dict__.items() 
                    if nm[0] != "_")
            names.update( cls_names )
        return ", ".join( "{0}.{1}={2}".format(class_val[0], nm, class_val[1]) 
            for nm,class_val in names.items() )

It's not clear this is required. But it's kind of cool for debugging.

Tuesday, March 17, 2015

Building Skills in Object-Oriented Design

New Kindle Edition of Building Skills in Object-Oriented Design.

It seems to work okay in my Kindle Readers.

I'm not sure it's really formatted completely appropriately. I'm not a book designer. But before I fuss around with font sizes, I think I need to spend some time on several more valuable aspects of a rewrite:

  1. Updating the text and revising for Python 3.
  2. Removing the (complex) parallels between Python and Java. The Java edition can be forked as a separate text. 
  3. Reducing some of the up-front sermonizing and other non-coding nonsense.
  4. Moving the unit testing and other "fit-and-finish" considerations forward.
  5. Looking more closely at the Sphinx epub features and how those work (or don't work) with the KindleGen application which transforms .html to .mobi. This is the last step of technical production, once the content is right.
I have two other titles to finish for Packt publishing.

Maybe I should pitch this to Packt, since there seems to be interest in the topic? A skilled technical editor from Packt and some reviewers are likely to improve the quality.

The question, though, will be how to fit this approach to programming into their product offerings? Since I have two other titles to finish for them, perhaps I'll just set this aside for now.

Tuesday, March 10, 2015

It appears that DevOps may be more symptom than solution

It appears that DevOps may be a symptom of a bigger problem. The bigger problem? Java.

Java development -- with a giant framework like Spring -- seems to accrete layers and layers of stuff. And more stuff.  And bonus stuff on top the the excess stuff.

The in-house framework that's used on top of the Spring Framework that's used to isolate us from WebLogic that used to isolate us from REST seems -- well -- heavy. Very heavy.

And the code cannot be built without a heavy investment in learning Maven. It can't be practically deployed without Hudson, Nexus, and Subversion. And Sonar and Hamcrest and mountains of stuff that's implicitly brought in by the mountain of stuff already listed in the Maven pom.xml files.

The deployment from Nexus artifacts to a WebLogic server also involves uDeploy. Because the whole thing is AWS-based, this kind of overhead seems unavoidable.

Bunches of web-based tools to manage the bunch of server-based tools to build and deploy.

Let me emphasize this: bunches of tools.

Architecturally, we're focused on building "micro services". Consequently, an API takes about a sprint to build. Sometimes a single developer can do the whole CRUD spectrum in a sprint for something relatively simple. That's five API's by our count: GET one, GET many, POST, PUT and DELETE: each operation counts as a separate API.

Then we're into CI/CD overheads. It's a full sprint of flailing around with deployment onto a dev servers to get something testable and get back test results so we can fix problems. A great deal of time spent making sure that all the right versions of the right artifacts are properly linked. Doesn't work? Oh. Stuff was updated: fix your pom's.

It's another sprint after that flailing around with the CI/CD folks to get onto official QA servers. Not building in Husdon? Wrong Nexus setup in Maven: go back to square one. Not deployable to WebLogic? Spring Beans that aren't relevant when doing unit testing are not being built by WebLogic on the QA server because the .XML configuration or the annotations or something is wrong.

What's important here is that ⅔ of the duration is related to the simple complexity of Java.

The DevOps folks are trying really hard to mitigate that complexity. And to an extent, they're sort of successful.

But. Let's take a step back.

  1. We have hellish complexity in our gigantic, layered software toolset.
  2. We've thrown complicated tools at the hellish complexity, creating -- well -- more complexity.

This doesn't seem right. More complexity to solve the problems of complexity just don't seem right.

My summary is this: the fact that DevOps even exists seems like an indictment of the awful complexity of the toolset. It feels like DevOps is a symptom and Java is the problem.

Tuesday, March 3, 2015

Let's all Build a Hat Rack

Wound up here: "A Place to Hang Your Hat" and the #LABHR hash tag.

H/T to this post: "Building a Hat Rack."

This is a huge idea. I follow some folks from the Code For America group. The +Mark Headd  Twitter feed (@mheadd) is particularly helpful for understanding this movement. Also, follow +Kevin Curry (@kmcurry) for more insights.

Open Source contributions are often anonymous and the rewards are intangible. A little bit of tangibility is a huge thing.

My (old) open source Python books have a "donate" button. Once in a while I'll collect the comments that come in on the donate button. They're amazingly positive and encouraging. But also private. Since I have a paying gig writing about Python, I don't need any more encouragement than I already have. (Indeed, I probably need less encouragement.)

However.

There are unsung heroes at every hackathon and tech meetup who could benefit from some recognition. Perhaps they're looking for a new job or a transition in their existing job. Perhaps they're looking to break through one of the obscure social barriers that seem to appear in a community where everyone looks alike.

And. There's a tiny social plus to being the Recognizer in Chief. There's a lot to be said in praise of the talent spotters and friction eliminators.