Bio and Publications

Tuesday, June 30, 2020

Over-Solving or Solving Problems You Don't Have

Sometimes we call them "Belt and Braces" solutions. As a former suspenders person who switched to belts, the idea of wearing both is a little like over-engineering. In the unlikely event of catastrophic failure of one system, your pants can still remain properly hoist. There's a weird, but defensible reason for that. Most over-engineering lacks a coherent reason. 

Sometimes we call them "Bells and Whistles." The solution has both bells and whistles for signaling. This is usually used in a derogatory sense of useless noisemakers, there for show only. Again, there's a really low-value and dumb, but defensible reason for this. 

While colorful, none of this is helpful for describing over-engineered software. Over-engineered software is often over-engineered for incoherent and indefensible reasons.

Over-engineering generally means trying to solve a problem that no user actually has. This leads to throwing around irrelevant features.

Concrete Example

I lived on a boat. I spent a fair amount of time fretting over navigation. 

There are two big questions: 
  1. How far apart are two points, really. 
  2. What's the real bearing from one point to another.
These are -- in some cases -- easy to answer.

If you have a printed, paper chart at the right scale, you can use dividers to compute a distance. It's actually a very easy task. Similarly, you can read the bearing off the chart directly. There's a trick to comparing a course to a nearby compass rose, but it's easy to learn and very accurate.

Of course, we don't want to painstakingly copy our notes from a paper chart to a spreadsheet to add them up to get total distance. And then fold in speed to get time and fuel consumption. These summary computations are a pain.

What you want is to do all of this with a computer.
  1. Plot the points using a piece of software like OpenCPN (https://opencpn.org).
  2. Extract the GPX file.
  3. Compute distances, bearings, and durations to create a route.
"So?" you ask.

So. When I did this, I researched the math and got a grip on the haversine formula for doing the spherical geometry computation of distances between points on a sphere.

It's not too bad. The formula are big-ish. But manageable. See http://www.edwilliams.org/avform.htm#Dist for the great circle distance formula.


For airplanes and powered freighters crossing oceans, this is perfect.

For a small sailboat going from Annapolis, Maryland, to the Bahamas, this level of complexity is craziness. While accurate, it doesn't really solve the problem I have. 

I don't actually need that much accuracy. 

I need this much accuracy.


And no more. This is the essential hypotenuse distance using an R-factor to convert the difference between latitudes and the distance between longitudes into pretty-close distances. For nautical miles, R is 60×180÷π. 

This is simpler and it solves the problem I actually have. Up to about 232 miles, the answer is within 1 mile of correct. The error grows quickly. Double the distance and the error seems to jump to 8 miles. A 464 mile sailing journey (at 6 knots) takes 3 days. Wind, weather, tides and currents will introduce more error than the simplifying assumptions.

What's important is this can be put into a spreadsheet without pain. I don't need to write sophisticated Python apps to apply haversine to sequences of way-points. I can do a simpler hypotenuse computation on waypoints converted to radians.

Is there a lesson learned?

I think there is.

There's the haversine a super-general solution. It handles great-circle routes elegantly. 

But it doesn't solve my actual problem. And that makes it over-engineering.

My problem is what we call rhumb-line sailing. Over short-enough distances the world may as well be flat. Over slightly longer distances, errors in the ship's compass and speedometer make a hyper-accurate great circle route moot. 

(Even with several fancy GPS-based navigation computers, a prudent mariner has paper backups. The list of waypoints, estimated times and directions are essential when the boat's GPS reciever fails.)

I don't really need the sophistication (and the potential for bugs) with haversine. It doesn't solve a problem I actually have.

Tuesday, June 2, 2020

Overcoming Incuriosity -- Sailing Over The Horizon

I'm in regular contact with a few folks who seem remarkably incurious.

Seem.

Perhaps they're curious about something other than software. I don't know.

But I do know they're remarkably incurious about software. And are trying to write Python applications.

I know some people don't sail out of sight of their home port. I've sailed over a few horizons. It's not courage. It's curiosity. And patience. And preparation.

I find this frustrating. I refuse to write their code for them.

But any advice I give them devolves to "Do you have an example?" With the implicit "Which I can copy and paste?"

Even the few who claim they don't want examples, suffer from a paralyzing level of incuriosity. They can't seem to make search work because they never read beyond the first few results on their first attempt. A lot of people seem to be able to make search work; and the incurious folks seem uniquely paralyzed by search.

And it's an attribute I don't understand.

Specific example.

They read through the multiprocessing module until they got to examples with apply_async() and appear to have stopped reading.  They've asked for code reviews on two separate module. Both based on apply_async().

One module was so hopelessly broken it was difficult to make the case that it could never be made to work. There's a way the results of apply_async() have to be consumed, and the code not only did not reflect this, it seemed like they had decided specifically never to consider an alternative. (Spoiler alert, it requires an explicit wait().)

The results were sometimes consumed -- by luck -- and the rest of the time, the app was quirky. It wasn't quirky. It was deplorably wrong. And "reread the apply_async()" advice fell on deaf ears. They couldn't have failed to read the page in the standard library documentation, no, it had to be Python or Windows or me or something.

The other module was a trivial map() application. But. Since apply_async() has an incumbency, there was an amazingly elaborate implementation that amounted to rebuilding apply() or map() with globals and callbacks. This was wrapped by queue processing of Byzantine complexity. The whole mess appeared to stem from an unwillingness to read the documentation past the first example.

What to do?

My current suggestion is to exhaustively enumerate each of the methods for putting work into the processing pool. Write an example of each and every one.

In effect: "Learn the methods by building throw-away code."

I anticipate a series of objections. "Why write throw-away code?" and this one: "That's not realistic, what do you do?"

What do I do?

I write throw-away code.

But that's no substitute for a lack of curiosity.