Tuesday, July 28, 2015

Amazon Reviews

Step 1. Go to amazon.com and look for one (or more) of my Python books.

Step 2. Have you read it?

  •     Yes: Thanks! Consider posting a review.
  •     No: Hmmm.
That's all. Consider doing this for other authors, also. 

Social media is its own weird economy. The currency seems to be evidence of eyeballs landing on content.

Tuesday, July 21, 2015

A Surprising Confusion

Well, it was surprising to me.

And it should not have been a surprise.

This is something I need to recognize as a standard confusion. And rewrite some training material to better address this.

The question comes up when SQL hackers move beyond simple queries and canned desktop tools into "Big Data" analytics. The pure SQL lifestyle (using spreadsheets, or Business Objects, or SAS) leads to an understanding of data that's biased toward working with collections in an autonomous way.

Outside the SELECT clause, everything's a group or a set or some kind of collection. Even in spreadsheet world, a lot of Big Data folks slap summary expressions on the top of a column to show a sum or a count without too much worry or concern.

But when they start wrestling with Python for loops, and the atomic elements which comprise a set (or list or dict), then there's a bit of confusion that creeps in.

An important skill is building a list (or set or dict) from atomic elements. We'll often have code that looks like this:

some_list = []
for source_object in some_source_of_objects:
    if some_filter(source_object):
        useful_object = transform(source_object)

This is, of course, simply a list comprehension. In some case, we might have a process that breaks one of the rules of using a generator and doesn't work out perfectly cleanly as a comprehension. This is somewhat more advanced topic.

The transformation step is what seems to causes confusion. Or -- more properly -- it's the disconnect between the transformation calculations on atomic items and the group-level processing to accumulate a collection from individual items.

The use of some_list.append() and some_list[index] and some_list is something that folks can't -- trivially -- articulate. The course material isn't clarifying this for them. And (worse) leaping into list comprehensions doesn't seem to help.

These are particularly difficult to explain if the long version isn't clear.

some_list = [transform(source_object) for source_object in some_source_of_objects if some_filter(source_object)]


some_list = list( map(transform, filter(some_filter, some_source_of_objects)) )

I'm going to have to build some revised course material that zeroes in on the atomic vs. collection concepts. What we do with an item (singular) and what we do with a list of items (plural).

I've been working with highly experienced programmers too long. I've lost sight of the n00b questions.

The goal is to get to the small data map-reduce. We have some folks who can make big data work, but the big data Hadoop architecture isn't ideal for all problems. We have to help people distinguish between big data and small data, and switch gears when appropriate. Since Python does both very nicely, we think we can build enough education to school up business experts to also make a more informed technology choice.

Tuesday, July 14, 2015

Upgrading to Python 3

Folks who don't use Python regularly -- the folks in TechOps, for example -- are acutely aware that the Python 3 language is "different," and the upgrade should be done carefully. They've done their homework, but, they're not experts in everything.

They feel the need to introduce Python 3 slowly and cautiously to avoid the remote possibility of breakage. Currently, the Python 3 installers are really careful about avoiding any possible conflicts between Python 2 and 3; tiptoeing isn't really necessary at all.

I was stopped cold from having Python 3 installed on a shared server by someone who insisted that I enumerate which "features" of Python 3 I required. By enumerating the features, they could magically decide if I had a real need for Python 3 or could muddle along with Python 2. The question made precious little sense for many reasons: (1) many things are backported from 3 to 2, so there's almost nothing that's exclusive to Python 3; (2) both languages are Turing-Complete, so any feature in language could (eventually) be built in the other; (3) I didn't even know languages has "features." The reason they wanted a feature list was to provide a detailed "no" instead of a generic "no." Either way, the answer was "no." And there's no reason for that.

In all cases, we can install Python 3 now. We can start using it now. Right now.

Folks who actually use Python regularly -- me, for example -- are well aware that there's a path to the upgrade. A path that doesn't involve waiting around and slowly adopting Python 3 eventually (where eventually ≈ never.)
  1. Go to your enterprise GitHub (and the older enterprise SVN and wherever else you keep code) and check out every single Python module. Add this line: from __future__ import print_function, division, unicode_literals. Fix the print statements. Just that. Touch all the code once. If there's stuff you don't want to touch, perhaps you should delete it from the enterprise GitHub at this time.
  2. Rerun all the unit tests. This isn't as easy as it sounds. Some scripts aren't properly testable and need to be refactored so that the top-level script is made into a function and a separate doctest function (or module) is added. Or use nose. Once you have an essentially testable module, you can add doctests as needed to be sure that any Python 2 division or byte-fiddling work correctly with Python 3 semantics for the operators and literals.
  3. Use your code in this "compatibility" mode for a while to see if anything else breaks. Some math may be wrong. Some use of bytes and Unicode may be wrong. Add any needed doctests. Fix things in Python 2 using the from __future__ as a bridge to Python 3. It's not a conversion. It's a step toward a conversion.
This is the kind of thing that can be started with an enterprise hack day. Make a list of all the projects with Python code. Create a central "All the Codes" GitHub project. Name each Python project as an issue in the "All the Codes" project. Everyone at the hack day can be assigned a project to check out, tweak for compatibility with some of the Python 3 features and test.

You don't even need to know any Python to participate in this kind of hack day. You're adding a line, and converting print statements to print() functions. You'll see a lot of Python code. You can ask questions of other hackers. At the end of the day, you'll be reasonably skilled.

Once this is done, the introduction of Python3 will not be a shock to anyone's system. The print() functions will be a thing. Exact division will be well understood. Unicode will be slightly more common.

And -- bonus -- everything will have unit tests. (Some things will be rudimentary place-holders, but that's still a considerable improvement over the prior state.)

Tuesday, July 7, 2015

Python Essentials

Get Packt's Python Essentials.

I think it covers a large number of important topics. Central to this is Python 3.4.

The book covers Python 3 with few -- if any -- backward glances. If it makes any mention of Python 2, the reference is strictly derogatory. There isn't even a mention of the old print statement, that's how forward-looking this is. The Python3 division operators are covered without the complexity of explaining the old Python 2 approach; the from __future__ import division is not mentioned once.

I've used a similar outline for training material at places with a mixed bag of Python 2 and Python 3. This leads to awkwardness because of the Python 2 quirks that have to be explained.

I prefer a clean approach. The essentials. Python 3 all the way.

Tuesday, June 23, 2015

Literate Programming and GitHub

I remain captivated by the ideals of Literate Programming. My fork of PyLit (https://github.com/slott56/PyLit-3) coupled with Sphinx seems to handle LP programming in a very elegant way.

It works like this.
  1. Write RST files describing the problem and the solution. This includes the actual implementation code. And everything else that's relevant. 
  2. Run PyLit3 to build final Python code from the RST documentation. This should include the setup.py so that it can be installed properly. 
  3. Run Sphinx to build pretty HTML pages (and LaTeX) from the RST documentation.
I often include the unit tests along with the sphinx build so that I'm sure that things are working.

The challenge is final presentation of the whole package.

The HTML can be easy to publish, but it can't (trivially) be used to recover the code. We have to upload two separate and distinct things. (We could use BeautifulSoup to recover RST from HTML and then PyLit to rebuild the code. But that sounds crazy.)

The RST is easy to publish, but hard to read and it requires a pass with PyLit to emit the code and then another pass with Sphinx to produce the HTML. A single upload doesn't work well.

If we publish only the Python code we've defeated the point of literate programming. Even if we focus on the Python, we need to do a separate upload of HTML to providing the supporting documentation.

After working with this for a while, I've found that it's simplest to have one source and several targets. I use RST ⇒ (.py, .html, .tex). This encourages me to write documentation first. I often fail, and have blocks of code with tiny summaries and non-existent explanations.

PyLit allows one to use .py ⇒ .rst ⇒ .html, .tex. I've messed with this a bit and don't like it as much. Code first leaves the documentation as a kind of afterthought.

How can we publish simply and cleanly: without separate uploads?

Enter GitHub and gh-pages.

See the "sphinxdoc-test" project for an example. Also this https://github.com/daler/sphinxdoc-test. The bulk of this is useful advice on a way to create the gh-pages branch from your RST source via Sphinx and some GitHub commands.

Following this line of thinking, we almost have the case for three branches in a LP project.
  1. The "master" branch with the RST source. And nothing more.
  2. The "code" branch with the generated Python code created by PyLit.
  3. The "gh-pages" branch with the generated HTML created by Sphinx.
I think I like this.

We need three top-level directories. One has RST source. A build script would run PyLit to populate the (separate) directory for the code branch. The build script would also run Sphinx to populate a third top-level directory for the gh-pages branch.

The downside of this shows up when you need to create a branch for a separate effort. You have a "some-major-change" branch to master. Where's the code? Where's the doco? You don't want to commit either of those derived work products until you merge the "some-major-change" back into master.

GitHub Literate Programming

There are many LP projects on GitHub. There are perhaps a dozen which focus on publishing with the Github-flavored Markdown as the source language. Because Markdown is about as easy to parse as RST, the tooling is simple. Because Markdown lacks semantic richness, I'm not switching.

I've found that semantically rich markup is essential. This is a key feature of RST. It's carried forward by Sphinx to create very sophisticated markup. Think :code:`sample` vs. :py:func:`sample` vs. :py:mod:`sample` vs. :py:exc:`sample`. The final typesetting may be similar, but they are clearly semantically distinct and create separate index entries.

A focus on Markdown seems to be a limitation. It's encouraging to see folks experiment with literate programming using Markdown and GitHub. Perhaps other folks will look at more sophisticated markup languages like RST.

Previous Exercises

See https://sourceforge.net/projects/stingrayreader/ for a seriously large literate programming effort. The HTML is also hosted at SourceForge: http://stingrayreader.sourceforge.net/index.html.

This project is awkward because -- well -- I have to do a separate FTP upload of the finished pages after a change. It's done with a script, not a simple "git push." SourceForge has a GitHub repository. https://sourceforge.net/p/stingrayreader/code/ci/master/tree/. But. SourceForge doesn't use  GitHub.com's UI, so it's not clear if it supports the gh-pages feature. I assume it doesn't, but, maybe it does. (I can't even login to SourceForge with Safari... I should really stop using SourceForge and switch to GitHub.)

See https://github.com/slott56/HamCalc-2.1 for another complex, LP effort. This predates my dim understanding of the gh-pages branch, so it's got HTML (in doc/build/html), but it doesn't show it elegantly.

I'm still not sure this three-branch Literate Programming approach is sensible. My first step should probably be to rearrange the PyLit3 project into this three-branch structure.

Tuesday, June 16, 2015

A plea to avoid sys.exit() [Updated]

Let me gripe about this for a moment.


The use case for this function is limited. Very, very limited.

Every place that this appears (except for one) is going to lead to reusability issues.

Consider some obscure little function, deep within the app.

def deep_within_the_app(x, y, zed):
        something -- doesn't matter what
    except SomeException:
        logging.exception( "deep_within_the_app")

What's so bad about that?

The function seizes control of every app that uses it by raising an unexpected exception.

We can (partially) undo this mischief by wrapping the function in a try/except which catches SystemExit.

def reusing_a_feature():
    for i in range(a_bunch):
        except SystemExit as e:
            print("error on {0}".format(i))

This will defeat the sys.exit(). But the cost is one of clarity. Why SystemExit? Why not some meaningful exception?

This is important: raise the meaningful exception instead of exit.

Bottom Line.

The right place for sys.exit() is inside the if __name__ == "__main__": section.
It might look something like this:

if __name__ == "__main__":
    except (KnownException, AnotherException) as ex:

Use meaningful exceptions instead of sys.exit().

This permits reuse of everything without a mysterious SystemExit causing confusion.

On "Taste" in Software Design

Read this: http://www.paulgraham.com/taste.html.

I was originally focused on "beauty". Clearly, good design is beautiful. Isn't that obvious? Why so many words to explain the obvious?

The post seemed useless. Why write it in the first place? Why share it? Why share it now, 12 years after it was written?

Because beauty can be elusive to some people. A more complete definition of some attributes of beauty are helpful.

This is not a throw-away concept. These are fourteen essential elements that need to be used as part of every software architectural design review. Indeed, it should be part of every code review. Although code perhaps shouldn't be "daring."

When we adopt an architecture, it should fit these criteria.

This doesn't replace more pragmatic software quality assurance considerations.  See http://www.sei.cmu.edu/reports/95tr021.pdf.

I'm currently delighted with "Good design is redesign."