Tuesday, May 10, 2016

Why Python? What's it good for? How is it special?

First. The question is moot. It's a programming language. It's good for programming.

When I push back, folks try to produce languages which exist only in certain pigeon holes.

"You know. PHP is for web and JavaScript runs in the browser. What's Python for?"

The PHP and JavaScript examples aren't helpful. That doesn't narrow the domain of problems for which Python is appropriate. It only shows that some languages have narrow domains.

"You know. Objective-C and Swift are for iOS. What's the predominant place Python is used?"

Python also runs on iOS. I don't know if it has suitable bindings for building apps. If it does, that doesn't change my answer. It's good for programming.

"Java is used mainly for web apps, right? What about Python?"

Okay. At this point, the question has slipped from moot to ignorant.

Can we just set that aside? Can we move on?

If you want some useful insight, start here:


Yes, it's an essay from 1974.  Parts of it are a little old-fashioned, but a lot of it is still rock-solid. For example, the idea of strongly-typed pointers is considered more-or-less standard now. It was debatable then. And Wirth's opinion continues to drive language design.

Page 28 has the key points: features of a programming language. Enumerated by the inventor of Pascal, Modula, Oberon, and other languages too numerous to recall.

Some of the list is a little dated. "...different character sets...," for example, has been superseded by Unicode.

Also, the list is focused on compiled languages. Python is a dynamic language. It's interpreted. Yes, there's a compiler, but that's mostly an optimization of the source code. If you replace "compiler" with "run-time", the list stands up as a description of good languages.

I like this list because it helps characterize why Python works out so well. And why many other languages are also pretty good. It points up the reason why quirky languages like JavaScript (or even Ruby) are suspicious. Some of the points about efficiency are important topics for further discussion.

I often have to remind folks who work with Big Data that most of our processing is I/O bound. Python waits for the database somewhat more efficiently than Java. Why does Python wait more efficiently? Because it uses less memory. Sometimes this is a win.

Let's not ask silly questions about a general-purpose language. Instead, let's benchmark solutions, and compare tangible performance numbers using real code.

Tuesday, May 3, 2016

The Lynda.com Experience

One word: "wow"

More words: "Helping shy people get up and do what needs to be done."

Yes, that's Garrison Keillor's tag line for one of the "sponsors" of "A Prairie Home Companion": the Powdermilk Biscuits company.  (Heavens. they're tasty and expeditious.)

The folks at Lynda are truly great at shepherding folks through the process of preparing and recording their material.

Recording is hard. The point is to say each thing perfectly. But, the things have to fit into a larger narrative of a section that fits into the larger sequence of chapters that makes up the course.'

Giving essentially the same content in a presentation at a conference is almost unrelated. Talking at a conference has a live audience. It's one-time-only, and you can ad-lib.

Doing this takes patience. And skilled editing both at a content level and at a technical level. Lynda has it all.

The thing that made me the most comfortable was having my presentation material ready. Each section is a 5-minute lightning talk. I was had all of my slides ready. I'd been through them enough times to be sure that I could handle the 5-minute format. And when there are editorial changes, they tended to be relatively minor.

I may try it again. It's a lot of work. Certainly more work than writing a chapter in a book. A chapter can go deep. A presentation has to stick to the high points: this means that the supporting depth must be there, but you're not going to wallow around in it. Essentially, you're making the "elevator pitch" for each one of your points.

The recording and live action studio space were fun. I've never been recorded or taped like that before. They eased me into it, coached me through it, and made sure all of the content was there in a way that could be edited into a high quality final product.

Tuesday, April 19, 2016

A NoSQL Conversation

This cropped up recently. It's part of a "replace Mongo with Relational DB" conversation.

I'm going to elide the conversation down to five key points. The three post-hoc nonsensical ideas, and the two real points.

What's (to me) very telling is that someone else published the five reasons in this order. As if they larded three on the front. Or included the two at the end out of guilt because they were avoiding the real issues.

Relational Queries are Desired. "the only way to find [the documents] would be to write a query that literally trolls through the entire database in order to find the most recent values". 

I beg to differ. "Only way" is a strong statement. Mongo has indexes. To suggest that they don't exist or don't work is misleading. The details of the use case involved searching by date. It's possible to contrive a database that does bad searches by date; the implication being that Mongo couldn't do date matching or something. 

To Enforce Constraints and Schema. "It is still possible for the application layer to ensure the constraint, but that relies on every single point in the application code enforcing it – a single error can lead to inconsistent data". 

This runs perilously close to the "what if some bonehead bypasses the API and hacks into the database directly?" question. Which is isomorphic to "what if all corporate governance disappeared tomorrow?" and "what if an evil genius hacks all our database drivers?"

Lack of Document-Oriented Access Patterns.  "If there are more complex access patterns (like reading certain fields from many records, or frequently updating single fields within a record) then a document-oriented database is not a good fit"

That's nonsense. Mongo has field-level updates. There was one example of a long-running transaction that appeared to be mis-designed. I suggested that an improved design might be less complex and expensive than rewriting the API's and moving the data.
Desire to Utilize [Relational DB]

More Support Available for [Relational DB]

Clearly, these last two are the real reasons. Everything above looks like post-hoc justification for the real issue. 
We're not sure we like Mongo. 
My point in the conversation was not to talk them out of making a switch. The last two reasons included the kind of compelling rationalization that can't be disputed.  The best I could do was to challenge the errors in the first three reasons so that everyone could be honest about the change. It's not technical. It's organizational.

Tuesday, April 5, 2016

The GUI Problem

I write Microservices. And not-so-micro Services. API's.

I got this email recently.

"Goal: get you to consider adding Gooey to your Python tool set"

What it's for: Turn a console-based Python program into one that sports a platform-native GUI.
Why it's great: Presenting people, especially rank-and-file users, with a command-line application is among the fastest ways to reduce its use. Few beyond the hardcore like figuring out what options to pass, or in what order. Gooey takes arguments expected by the argparse library and presents them users as a GUI form, with all options labeled and presented with appropriate controls (such as a drop-down for a multi-option argument, and so on). Very little additional coding -- a single include and a single decorator -- is needed to make it work, assuming you're already using argparse."

The examples and the GitHub documentation make it look delightful.

However.  It's utterly useless for me.  Interesting but useless.

From my perspective, API's and microservices are vastly more important than desktop GUI's.

I'll repeat that in order to start a food-fight:

API's and microservices are more important than desktop GUI's

I almost forgot the important qualifiers: When working with Big Data. Or When working with DevOps Automation.

I realize that some people like to cling to the desktop GUI as a Very Important Thing™. Which could be why they send me emails touting the advantages of some kind of GUI tool or framework. The Desktop GUI is important, but, from my perspective, it's a niche.

Actually two niches.

Niche 1. The word processor and spreadsheet and a few other generic tools for putting text into a computer. While desktop versions are better than server-side emacs and vi, they fill a similar purpose. An IDE is (from this perspective) is little more than a glorified text editor. In places that use Jenkins and Hudson and uDeploy and all of those server-based tools, the desktop IDE is a place to stage code for Jenkins jobs to do the "real" build.

Niche 2. All the other tools that turn a small-ish computer into a dedicated workstation for specific kinds of media production. Video. Audio. Image. Typesetting. These are not "generic" applications like word processors or spreadsheets; they're very specific and narrowly-focused applications. They rely on effectively transforming the general-purpose computer into a very special-purpose computer.

Super-fancy desktop-based tools for analytics or Big Data processing are not actually too useful. Anyone trying to use a desktop as an enterprise systems of record is asking for trouble.  I work with folks trying to process terabyte datasets on their laptops and wondering why it takes so long. My company has servers. We pay for MongoDB and Hadoop. We have API's to access big databases with big piles of data. I'm automating the toolsets as fast as I can so they can work with giant datasets.

Gooey looks like fun. But not for me.

Tuesday, March 29, 2016

The Data Structures and Algorithms Problem

Here's a snippet of an email
In big data / data science, the curse of dimensionality keeps showing up over and over. A good place to start is the wiki article “curse of dimensionality.” The issue seems to be that a lot of these big data / data science people have not taken the time to study fundamental data structures.
There was more about Foundations of Multidimensional and Metric Data Structures by Hanan Samet being too detailed. And Stack Overflow being too high-level.  And more hand-wringing after that, too.

The email was pleading for some book or series of blog posts that would somehow educate data science folks on more fundamental issues of data structures and algorithms. Perhaps getting them to drop some dimensions when doing k-NN problems or perhaps exploit some other data structure that didn't involve 100's of columns.

I think.

I'm guessing because -- like a lot of hand-waving emails -- it didn't involve code. And yes, I'm very bigoted about the distinction between code and hand-waving.

If there is a lack of awareness of appropriate data structures, the real place to start is The Algorithm Design Manual by Steven Skiena.

I harbor my doubts that this is the real problem, however. I think that the broad spectrum of computing applications leads to a lot of specialization. I don't think that it's really prudent to try and think of generalists who can handle deep data science issues as well as algorithm design and performance issues. No one expects them to write JavaScript and tinker with CSS so that the web site which presents the results looks good.

I actually think the real problem is that some folks expect too much from their data scientists.

In fantasy land the rock stars are full stack developers who can span the entire spectrum from OS to CSS. In the real world, developers have different strengths and interests. In some cases, "full stack" means mediocre skills in a lot of areas.

Here's a more useful response: Bridging the Gap Between Data Science and DevOps. I don't think the problem is "big data / data science people have not taken the time to study fundamental data structures". I think the problem is that big data is a cooperative venture. It takes a team to solve a problem.

Tuesday, March 15, 2016

PacktPub Looking For Python Projects

Do you have a good project? Do you want to write?

The acquisition folks at Packt are looking for this:

"... demonstrate 4-5 projects over the course of the chapters in order to demonstrate how to build scalable Python projects from scratch. These projects cover some of the most important concepts in Python and the common problems that a Python programmer faces on a day-to-day basis..."

I'm busy already. And most of my examples are owned by my employer. I'm not sure the exceptions are interesting enough.

You get to work with a really good publication team. I've been thrilled.

See https://www.packtpub.com/books/info/packt/contact-us Drop Shaon Basu's name.

Tuesday, March 8, 2016

The Composite Builder Pattern, an Example of Declarative Programming [Update]

I'm calling this the Composite Builder pattern. This may have other names, but I haven't seen them. It could simply be lack of research into prior art. I suspect this isn't very new. But I thought it was cool way to do some declarative Python programming.

Here's the concept.

class TheCompositeThing(Builder):
    attribute1 = SomeItem("arg0")
    attribute2 = AnotherItem("arg1")
    more_attributes = MoreItems("more args")

The idea is that when we create an instance of TheCompositeThing, we get a complex object, built from various data sources.  We want to use this in the following kind of context:

with some_config_path.open() as config:
    the_thing = TheCompositeThing().substitute(config)

We want to open some configuration file -- something that's unique to an environment -- and populate the complex object in one smooth motion. Once we have the complex object, it can then be used in some way, perhaps serialized as a JSON or YAML document.

Each Item has a get() method that accepts the configuration as input. These do some computation to return a useful result. In some cases, the computation is kind of degenerate case:

class LiteralItem(Item):
    def __init__(self, value):
        self.value = value
    def get(self, config):
        return self.value

This shows how we jam a literal value into the output. Other values might involve elaborate computations, or lookups in the configuration, or a combination of the two.

Why Use a Declarative Style?

This declarative style can be handy when each of the Items in TheCompositeThing involves rather complex, but completely independent computations. There's no dependency here, so the substitute() method can fill in the attributes in any order. Or -- perhaps -- not fill the attributes until they're actually requested. This pattern allows eager or lazy calculation of the attributes.

This pattern applies to building complex AWS Cloud Formation Templates as an example. We often need to make a global tweak to a large number of templates so that we can rebuild a server farm. There's little or no dependency among the Item values being filled in. There's no strange "ripple effect" of a change in one place also showing up in another place because of an obscure dependency between items.

We can extend this to have a kind of pipeline with each stage created in a declarative style. In this more complex situation, we'll have several tiers of Items that fill in the composite object. The first-stage Items depend on one source. The second stage Items depend on the first-stage Items.

class Stage1(Builder):
    item_1 = Stage_1_Item("arg")
    item_2 = Stage_1_More("another")

class Stage2(Builder):
    item_a = Stage_2_Item("some_arg")
    item_b = Stage_2_Another(355, 113)

We can then create a Stage1 object from external configuration or inputs. We can create the derived Stage2 object from the Stage1 object.

And yes. This seems like useless metaprogramming.  We could -- more simply -- do something like this::

class Stage2:
    def __init__(self, stage_1, config):
        self.item_a = Stage_2_Item("some_arg", stage_1, config)
        self.item_b = Stage_2_Another(355, 113, stage_1, config)

We've eagerly computed the attributes during __init__() processing.

Or perhaps this::

class Stage2:
    def __init__(self, stage_1, config):
        self.stage_1= stage_1
        self.config= config
    def item_a(self):
        return Stage_2_Item("some_arg", self.stage_1, self.config)
    def item_b(self):
        return Stage_2_Another(355, 113, self.stage_1, self.config)

Here we've been lazy and only computed attribute values as they are requested.


We've looked at three ways to build composite objects:
  1. As independent attributes with an flexible but terse implementation.
  2. As attributes during __init__() using sequential code that doesn't assure independence.
  3. As properties using wordy code. 
What's the value proposition? Why is this declarative technique interesting?

I find that the the Declarative Builder pattern is handy because it gives me the following benefits.
  • The attributes must be built independently. We can -- without a second thought -- rearrange the attributes and not worry about one calculation interfering with another attribute. 
  • The attributes can be built eagerly or lazily. Details don't matter. We don't expose the implementation details via __init__ or @property.
  • The class definition becomes a configuration item. A support technician without deep Python knowledge can edit the definition of TheCompositeThing successfully.
I think this kind of lazy, declarative programming is useful for some applications. It's ideal in those cases where we need to isolate a number of computations from each other to allow the software to evolve without breaking.

It may be a stretch, but I think this shows the Depedency Inversion Principle. To an extent, we've moved all of the dependencies to the visible list of attributes within these classes. The items classes do not depend on each other; they depend on configuration or perhaps previous stage composite objects. Since there are no methods involved in the class defintion, we can change the class freely. Each subclass of Builder is more like a configuration item than it is like code. In Python, particularly, we can change the class freely without the agony of a rebuild.

A Build Implementation

We're reluctant to provide a concrete implementation for the above examples because it could go anywhere. It could be done eagerly or lazily. One choice for a lazy implementation is to use a substitute() method. Another choice is to use the __init__() method.

We might do something like this:

def substitute(self, config):
    class_dict= self.__class__.__dict__
    for name in class_dict:
        if name.startswith('__') and name.endswith('__'): continue
        setattr(self, name, class_dict[name].get(config))

This allows us to lazily build the composite object by stepping through the dictionary defined at the class level and filling in values for each item. This could be done via __getattr__() also.