Thursday, July 30, 2020

Modern Python Cookbook Journey

For the author, a book is a journey.  

Writing something new, the author describes a path the reader can follow to get from -- well -- anywhere the reader might be to the author's suggested destination. Not everyone makes the whole trip. And not everyone arrives at the hoped-for destination.

Second editions? The idea is to update the directions to reflect the new terrain.  

I'm a sailor. Here's a view of the boat.


What's important to me is the way the authorities produce revised nautical charts on a stable, regular cadence. There's no "final" chart, there's only the "current" chart. Kept up-to-date by the patient hard work of armies of cartographers. 

Is updating a book like updating the nautical charts? I don't think so. Charts have a variety of update cadences.  For sailors in the US, we start here: https://nauticalcharts.noaa.gov/charts/chart-updates.html. The changes can be frequent. See https://distribution.charts.noaa.gov/weekly_updates/ for the weekly chart updates. This is supplemented by the Notices to Mariners, here, too: https://msi.nga.mil/NTM. So, I think charts are much, much more complex than books.

Sailors have to integrate a lot of data.  This is no different from software developers having to keep abreast of language, library, and platform changes.

The author's journey is different from the reader's journey. A technical book isn't a memoir. 

The author may have crashed into all kinds of rocks and shoals. The author's panic, fear, and despair are not things the reader needs to know about. The reader needs to know the course to set, the waypoints, and hazards. The estimated distances and the places to anchor that provide shelter.

For me, creating a revision is possibly as difficult as the initial writing. I don't know how other authors approach subsequent editions, but the addition of type hints meant every example had to be re-examined.  And this meant discovering problems in code that I *thought* was exemplary. 

While many code examples can simply have type hints pasted in, some Python programming practices have type hints that can't be trivially introduced to the code. Instead some thinking is required.

Generics

Python code is always generic with respect to type. Expressions like a + b will work for a surprisingly wide variety of object classes. Of course, we expect any of the numbers to work. But lists, tuples, and strings all respond to the "+" operator. This is implemented by a sophisticated check of a's __add__() and b's __radd__() methods.

When we write hints, it's often intended to narrow the domain of potential types. Here's some starting code.

def fact(a):
   if a == 0:
       return 1
   return a*fact(a-1)

The implied type hint is Any. This means, any class of objects that defines __eq__(), __mul__() and __sub__() will work. There are a fair number of these classes.

When we write type hints, we narrow the domain. In this case, it should be integers. Like this:

def fact(a: int) -> int:
    if a == 0:
        return 1
    return a*fact(a-1)

This tells mypy (or other, similar analytic tools) to confirm that every place the fact() function is used, the arguments will be integers. Also, the result will be an integer.

What's important is there's no run-time consequence to this. Python runs the same whether we evaluate fact(2) or fact(3.0).  The integer-based computation clearly matches the intent stated in the code. The floating-point computation is clearly at odds with the stated intent.

And this brings us to the author's journey.

Shoal Water

Sometimes we have code that works. And will always work. But. The type hints are hard to express.

The most common examples?

Decorators.

Decorators can be utterly and amazingly generic. And this can make it very, very difficult to express the domain of types involved.

def make_a_log(some_function: Callable) -> Callable:
    @wraps(some_function)
    def concrete_function(*args, **kwargs):
        print(some_function, args, kwargs)
        result = some_function((*args, **kwargs)
        print(result)
    return concrete_function

This is legal, but very shady Python. The use of the Callable type hint is almost intentionally misleading. It could be anything. Indeed, because of the way Python works, it can truly be any kind of function or method. Even a lambda object can be decorated with this. 

The internal concrete_function doesn't have any type hints. This forces mypy to assume Any, and that will lead to a possibly valid application of this decorator when -- perhaps -- it wasn't really appropriate.

In the long run, this kind of misleading hinting is a bad policy.

In the short run, this code will pass every unit test you can throw at it.

What does the author do?
  1. Avoid the topic? Get something published and move on? It is simpler and quicker to ignore decorators when talking about type hints. Dropping the section from the outline would have been easy.
  2. Dig deeply into how we can create Protocols to express a narrower domain of candidates for this decorator? This is work. And it's new work, since the previous edition never touched on the subject. But. Is it part of this cookbook? Or do these deeper examples belong in a separate book?
  3. Find a better example? 
Spoiler Alert: It's all three.

I start by wishing I hadn't broached the topic in the first edition. Maybe I should pretend it wasn't there and leave it out of the second edition.

Then I dig deeply into the topic, overwriting the topic until I'm no longer sure I can write about it. There's enough, and there's too much. A journey requires incremental exposition, and the side-trip into Protocols may not be the appropriate path for any but a very few readers.

After this, I may decide to throw the example out and look for something better.  What's important is having an idea of what is appropriate for the reader's journey, and what is clutter.

The final result can be better because it can be:
  • Focused on something useful.
  • Any edge cases can be corrected to work with the latest language, library, and mypy release.
  • Where necessary, replaced by an alternative example that's clearer and simpler.
Unfortunately (for me) I examine everything. Every word. Every example.

Packt seems to be tolerant of my slow pace of delivery. For me, it simply takes a long time to rewrite -- essentially -- everything. I think the result is worth all the work.

Tuesday, July 28, 2020

Modern Python Cookbook 2nd ed -- Advance Copies -- DM me

This is your "why wait" invitation.

Advanced copies will be available.  

IF.

And this is a big "if".

You have to write a blurb. 

I'll be putting you in contact with Packt marketing folks who will get you your advanced copy so you can write blurbs and reviews and -- well -- actually use the content.

It's all updated to Python 3.8. Type hints almost everywhere. F-strings and the walrus operator. Bunches of devops and data science examples. Plus a few personal examples involving sailboat navigation and management.

See me at LinkedIn https://www.linkedin.com/in/steven-lott-029835/ and I'll hook you up with Packt marketing folks.

See https://www.amazon.com/Modern-Python-Cookbook-Updated-programmer/dp/180020745X for the official Amazon Book Link. This is for ordinary "no obligation to write a review" orders.

DM me directly slott56 at gmail to be put into the marketing spreadsheet.

Tuesday, June 30, 2020

Over-Solving or Solving Problems You Don't Have

Sometimes we call them "Belt and Braces" solutions. As a former suspenders person who switched to belts, the idea of wearing both is a little like over-engineering. In the unlikely event of catastrophic failure of one system, your pants can still remain properly hoist. There's a weird, but defensible reason for that. Most over-engineering lacks a coherent reason. 

Sometimes we call them "Bells and Whistles." The solution has both bells and whistles for signaling. This is usually used in a derogatory sense of useless noisemakers, there for show only. Again, there's a really low-value and dumb, but defensible reason for this. 

While colorful, none of this is helpful for describing over-engineered software. Over-engineered software is often over-engineered for incoherent and indefensible reasons.

Over-engineering generally means trying to solve a problem that no user actually has. This leads to throwing around irrelevant features.

Concrete Example

I lived on a boat. I spent a fair amount of time fretting over navigation. 

There are two big questions: 
  1. How far apart are two points, really. 
  2. What's the real bearing from one point to another.
These are -- in some cases -- easy to answer.

If you have a printed, paper chart at the right scale, you can use dividers to compute a distance. It's actually a very easy task. Similarly, you can read the bearing off the chart directly. There's a trick to comparing a course to a nearby compass rose, but it's easy to learn and very accurate.

Of course, we don't want to painstakingly copy our notes from a paper chart to a spreadsheet to add them up to get total distance. And then fold in speed to get time and fuel consumption. These summary computations are a pain.

What you want is to do all of this with a computer.
  1. Plot the points using a piece of software like OpenCPN (https://opencpn.org).
  2. Extract the GPX file.
  3. Compute distances, bearings, and durations to create a route.
"So?" you ask.

So. When I did this, I researched the math and got a grip on the haversine formula for doing the spherical geometry computation of distances between points on a sphere.

It's not too bad. The formula are big-ish. But manageable. See http://www.edwilliams.org/avform.htm#Dist for the great circle distance formula.


For airplanes and powered freighters crossing oceans, this is perfect.

For a small sailboat going from Annapolis, Maryland, to the Bahamas, this level of complexity is craziness. While accurate, it doesn't really solve the problem I have. 

I don't actually need that much accuracy. 

I need this much accuracy.


And no more. This is the essential hypotenuse distance using an R-factor to convert the difference between latitudes and the distance between longitudes into pretty-close distances. For nautical miles, R is 60×180÷π. 

This is simpler and it solves the problem I actually have. Up to about 232 miles, the answer is within 1 mile of correct. The error grows quickly. Double the distance and the error seems to jump to 8 miles. A 464 mile sailing journey (at 6 knots) takes 3 days. Wind, weather, tides and currents will introduce more error than the simplifying assumptions.

What's important is this can be put into a spreadsheet without pain. I don't need to write sophisticated Python apps to apply haversine to sequences of way-points. I can do a simpler hypotenuse computation on waypoints converted to radians.

Is there a lesson learned?

I think there is.

There's the haversine a super-general solution. It handles great-circle routes elegantly. 

But it doesn't solve my actual problem. And that makes it over-engineering.

My problem is what we call rhumb-line sailing. Over short-enough distances the world may as well be flat. Over slightly longer distances, errors in the ship's compass and speedometer make a hyper-accurate great circle route moot. 

(Even with several fancy GPS-based navigation computers, a prudent mariner has paper backups. The list of waypoints, estimated times and directions are essential when the boat's GPS reciever fails.)

I don't really need the sophistication (and the potential for bugs) with haversine. It doesn't solve a problem I actually have.

Tuesday, June 2, 2020

Overcoming Incuriosity -- Sailing Over The Horizon

I'm in regular contact with a few folks who seem remarkably incurious.

Seem.

Perhaps they're curious about something other than software. I don't know.

But I do know they're remarkably incurious about software. And are trying to write Python applications.

I know some people don't sail out of sight of their home port. I've sailed over a few horizons. It's not courage. It's curiosity. And patience. And preparation.

I find this frustrating. I refuse to write their code for them.

But any advice I give them devolves to "Do you have an example?" With the implicit "Which I can copy and paste?"

Even the few who claim they don't want examples, suffer from a paralyzing level of incuriosity. They can't seem to make search work because they never read beyond the first few results on their first attempt. A lot of people seem to be able to make search work; and the incurious folks seem uniquely paralyzed by search.

And it's an attribute I don't understand.

Specific example.

They read through the multiprocessing module until they got to examples with apply_async() and appear to have stopped reading.  They've asked for code reviews on two separate module. Both based on apply_async().

One module was so hopelessly broken it was difficult to make the case that it could never be made to work. There's a way the results of apply_async() have to be consumed, and the code not only did not reflect this, it seemed like they had decided specifically never to consider an alternative. (Spoiler alert, it requires an explicit wait().)

The results were sometimes consumed -- by luck -- and the rest of the time, the app was quirky. It wasn't quirky. It was deplorably wrong. And "reread the apply_async()" advice fell on deaf ears. They couldn't have failed to read the page in the standard library documentation, no, it had to be Python or Windows or me or something.

The other module was a trivial map() application. But. Since apply_async() has an incumbency, there was an amazingly elaborate implementation that amounted to rebuilding apply() or map() with globals and callbacks. This was wrapped by queue processing of Byzantine complexity. The whole mess appeared to stem from an unwillingness to read the documentation past the first example.

What to do?

My current suggestion is to exhaustively enumerate each of the methods for putting work into the processing pool. Write an example of each and every one.

In effect: "Learn the methods by building throw-away code."

I anticipate a series of objections. "Why write throw-away code?" and this one: "That's not realistic, what do you do?"

What do I do?

I write throw-away code.

But that's no substitute for a lack of curiosity.

Tuesday, May 26, 2020

Modern Python Cookbook 2nd ed -- big milestone

Whew.

Chapter rewrites finished.

Technical reviews in process.

Things are going pretty well. Look for Packt to publish this in the next few months. Details will be posted.

Now. For LinkedIn Learning course recordings.

Tuesday, April 21, 2020

Why Python is not the programming language of the future -- a response

See https://towardsdatascience.com/why-python-is-not-the-programming-language-of-the-future-30ddc5339b66.

This is an interesting article with some important points. And. It has some points that I disagree with.

  • Speed. This is a narrow perspective. numpy and pandas are fast, dask is fast. A great many Python ecosystem packages are fast. This complaint seems to be unsupported by evidence.
  • Dynamic Scoping Rules. This actually isn't the problem. The problem is something about not being able to change containing scopes. First, I'm not sure changing nesting scopes is of any value at all. Second, the complaint ignores the global and nonlocal statements. The vague "leads to a lot of confusion" seems unsupported by any evidence. 
  • Lambdas. The distinction between expressions and statements isn't really a distinction in Python in general, only in  the bodies of lambdas. I'm not sure what the real problem is, since a lambda with statements seems like a syntactic nightmare better solved with an ordinary, named function.
  • Whitespace. Sigh. I've worked with many people who get the whitespace right but the {}'s wrong in C++. The code looks great but doesn't work. Python gets it right. The code looks great and works.
  • Mobile App Platform. See https://beeware.org
  • Runtime Errors. "coding error manifests itself at runtime" seems to be the problem. I'm not sure what this means, because lots of programming languages have run-time problems. Here's the quote: "This leads to poor performance, time consumption, and the need for a lot of tests. Like, a lot of tests." Performance? See above. Use numpy. Or Cuda. Time consumption? Not sure what this means. A lot of tests? Yes. Software requires tests. I'm not sure that a compiled language like Rust, Go, or Julia require fewer tests. Indeed, I think the testing is essentially equivalent.
I'm interested in ways Python could be better. 

Tuesday, April 14, 2020

The COBOL-to-SomeBetterLang Translator

Here's a popular idea.
... a COBOL-to-X translator, where X is a more-modern programming language ...
This is a noble aspiration.

In principle -- down deep -- all programming can be reduced to an idealized Turing Machine.

This means that we *should* be able to locate all the state changes in a given spaghetti-bowl of COBOL. Given the abstract state transitions, we can emit a version of that machine in any language.

Emphasis on the *should*.

There are road-blocks.

The first two are rarities. But. When confronted with these, we'll have significant problems.

  • The ALTER statement means the code can be changed at run-time. There are constraints, but still... When the code is not static, the possible domain of state changes moves outside working storage and into the procedure division itself.
  • A data structure with a RENAMES clause. This adds a layer of alternative naming, making the data states quite a bit more complex.
The next one is a huge complication: the GOTO statement. This makes state transitions extremely difficult to analyze. It's possible to untangle any GOTO of arbitrary complexity into properly tested IF and WHILE statements. 

However. The tangle of GOTO's may have been actually meaningful. It may have carried some suggestion of a business owner's intent. A COBOL elimination algorithm may turn tangled code into opaque code. (It's also possible that it clarifies age-old bad programming.)

The ordinary REDEFINES clause. This was heavily used as a storage optimization for the tiny, slow file systems we had back in the olden days. It's a union of distinct types. And. It's a "free" union. We do not know how to distinguish the various types that are being redefined. It's intimately tied to processing logic in the procedure division.

Just to make it even more horrifying...

File layouts evolve over time. It's entirely possible for a *working*, *valid*, *in-production* file to have content that does not match any working program's DDE. The data has flags or indicators or something that lets the app glide past the bad data. But the data is bad. It used to be good. Then something changed, and now it's almost uninterpretable. But the apps work because there are enough paths through the logic to make the row "work" without it matching any file layout in any obvious way.

I'm not sure an automated translation from COBOL is of any value. 

I think it's far better to start with file layouts, review the code, and then write new code from scratch in a modern language. This manual rewrite leads directly to small programs that -- in a modern language -- are little more than class definitions. In some cases, each legacy COBOL app would like becomes a Python module.

Given snapshots of legacy files, the Python can be tested to be sure it does the same things. The processing is not nuanced, or tricky, or even particularly opaque.

The biggest problem is the knowledge captured in COBOL code tends to be disorganized. The real work is disentangling it. A language that supports ruthless refactoring will be helpful.