S.Lott-Software Architect: July 2015

Tuesday, July 28, 2015

Amazon Reviews

Step 1. Go to amazon.com and look for one (or more) of my Python books.

Step 2. Have you read it?

Yes: Thanks! Consider posting a review.
No: Hmmm.

That's all. Consider doing this for other authors, also.

Social media is its own weird economy. The currency seems to be evidence of eyeballs landing on content.

Tuesday, July 21, 2015

Well, it was surprising to me.

And it should not have been a surprise.

This is something I need to recognize as a standard confusion. And rewrite some training material to better address this.

The question comes up when SQL hackers move beyond simple queries and canned desktop tools into "Big Data" analytics. The pure SQL lifestyle (using spreadsheets, or Business Objects, or SAS) leads to an understanding of data that's biased toward working with collections in an autonomous way.

Outside the SELECT clause, everything's a group or a set or some kind of collection. Even in spreadsheet world, a lot of Big Data folks slap summary expressions on the top of a column to show a sum or a count without too much worry or concern.

But when they start wrestling with Python for loops, and the atomic elements which comprise a set (or list or dict), then there's a bit of confusion that creeps in.

An important skill is building a list (or set or dict) from atomic elements. We'll often have code that looks like this:

some_list = []
for source_object in some_source_of_objects:
    if some_filter(source_object):
        useful_object = transform(source_object)
        some_list.append(useful_object)

This is, of course, simply a list comprehension. In some case, we might have a process that breaks one of the rules of using a generator and doesn't work out perfectly cleanly as a comprehension. This is somewhat more advanced topic.

The transformation step is what seems to causes confusion. Or -- more properly -- it's the disconnect between the transformation calculations on atomic items and the group-level processing to accumulate a collection from individual items.

The use of some_list.append() and some_list[index] and some_list is something that folks can't -- trivially -- articulate. The course material isn't clarifying this for them. And (worse) leaping into list comprehensions doesn't seem to help.

These are particularly difficult to explain if the long version isn't clear.

some_list = [transform(source_object) for source_object in some_source_of_objects if some_filter(source_object)]

and

some_list = list( map(transform, filter(some_filter, some_source_of_objects)) )

I'm going to have to build some revised course material that zeroes in on the atomic vs. collection concepts. What we do with an item (singular) and what we do with a list of items (plural).

I've been working with highly experienced programmers too long. I've lost sight of the n00b questions.

The goal is to get to the small data map-reduce. We have some folks who can make big data work, but the big data Hadoop architecture isn't ideal for all problems. We have to help people distinguish between big data and small data, and switch gears when appropriate. Since Python does both very nicely, we think we can build enough education to school up business experts to also make a more informed technology choice.

Tuesday, July 14, 2015

Upgrading to Python 3

Folks who don't use Python regularly -- the folks in TechOps, for example -- are acutely aware that the Python 3 language is "different," and the upgrade should be done carefully. They've done their homework, but, they're not experts in everything.

They feel the need to introduce Python 3 slowly and cautiously to avoid the remote possibility of breakage. Currently, the Python 3 installers are really careful about avoiding any possible conflicts between Python 2 and 3; tiptoeing isn't really necessary at all.

I was stopped cold from having Python 3 installed on a shared server by someone who insisted that I enumerate which "features" of Python 3 I required. By enumerating the features, they could magically decide if I had a real need for Python 3 or could muddle along with Python 2. The question made precious little sense for many reasons: (1) many things are backported from 3 to 2, so there's almost nothing that's exclusive to Python 3; (2) both languages are Turing-Complete, so any feature in language could (eventually) be built in the other; (3) I didn't even know languages has "features." The reason they wanted a feature list was to provide a detailed "no" instead of a generic "no." Either way, the answer was "no." And there's no reason for that.

In all cases, we can install Python 3 now. We can start using it now. Right now.

Folks who actually use Python regularly -- me, for example -- are well aware that there's a path to the upgrade. A path that doesn't involve waiting around and slowly adopting Python 3 eventually (where eventually ≈ never.)

Go to your enterprise GitHub (and the older enterprise SVN and wherever else you keep code) and check out every single Python module. Add this line: from __future__ import print_function, division, unicode_literals. Fix the print statements. Just that. Touch all the code once. If there's stuff you don't want to touch, perhaps you should delete it from the enterprise GitHub at this time.
Rerun all the unit tests. This isn't as easy as it sounds. Some scripts aren't properly testable and need to be refactored so that the top-level script is made into a function and a separate doctest function (or module) is added. Or use nose. Once you have an essentially testable module, you can add doctests as needed to be sure that any Python 2 division or byte-fiddling work correctly with Python 3 semantics for the operators and literals.
Use your code in this "compatibility" mode for a while to see if anything else breaks. Some math may be wrong. Some use of bytes and Unicode may be wrong. Add any needed doctests. Fix things in Python 2 using the from __future__ as a bridge to Python 3. It's not a conversion. It's a step toward a conversion.

This is the kind of thing that can be started with an enterprise hack day. Make a list of all the projects with Python code. Create a central "All the Codes" GitHub project. Name each Python project as an issue in the "All the Codes" project. Everyone at the hack day can be assigned a project to check out, tweak for compatibility with some of the Python 3 features and test.

You don't even need to know any Python to participate in this kind of hack day. You're adding a line, and converting print statements to print() functions. You'll see a lot of Python code. You can ask questions of other hackers. At the end of the day, you'll be reasonably skilled.

Once this is done, the introduction of Python3 will not be a shock to anyone's system. The print() functions will be a thing. Exact division will be well understood. Unicode will be slightly more common.

And -- bonus -- everything will have unit tests. (Some things will be rudimentary place-holders, but that's still a considerable improvement over the prior state.)

Tuesday, July 7, 2015

Python Essentials

Get Packt's Python Essentials.

I think it covers a large number of important topics. Central to this is Python 3.4.

The book covers Python 3 with few -- if any -- backward glances. If it makes any mention of Python 2, the reference is strictly derogatory. There isn't even a mention of the old print statement, that's how forward-looking this is. The Python3 division operators are covered without the complexity of explaining the old Python 2 approach; the from __future__ import division is not mentioned once.

I've used a similar outline for training material at places with a mixed bag of Python 2 and Python 3. This leads to awkwardness because of the Python 2 quirks that have to be explained.

I prefer a clean approach. The essentials. Python 3 all the way.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.