Wednesday, June 16, 2021

SQL linting — this sounds cool

 https://www.pythonpodcast.com/sqlfluff-sql-linter-episode-318/

Tuesday, June 15, 2021

Architectural Boundaries: Which Package/Module/Class Owns That Responsibility?

 The SOLID design principles beat the design boundary issue to death. Here are the principles in my preferred order. (See https://www.linkedin.com/learning/learning-s-o-l-i-d-programming-principles

  1. Interface Segregation -- minimize the boundaries. Do this first.
  2. Liskov Substitution -- keep the boundaries consistent. Do this for hierarchies.
  3. Open/Closed -- keep the boundaries stable and allow subclasses. 
  4. Dependency [Inversion] Injection -- keep the implementation separate from the design.
  5. Single Responsibility -- This is essentially a summary of the above four principles.

The point here is that these principles are pleasantly poetic, but there are those edgy cases where an interface can go either way.

Specifically, here's an Edgy Case that can go either way.

We're reading GPX (GPS Exchange) data. See https://www.topografix.com/GPX/1/1/

Associated with this is what's known as the Lowrance USR file format. A lot of devices include the same (or similar) underlying software, and can exchange waypoint and route information in USR format.

We have this as part of the underlying model.

  • The underlying Angle as an abstraction. This has two subclasses:
    • Latitude. An angle with "N" and "S" for its sign, conventionally shown as a two-digit number of degrees: 25°42.925′N
    • Longitude. An angle with "E" and "W" for its sign, conventionally shown as a three-digit number of degrees: 080°13.617′W
  • A Point (or LatLon) is a two-tuple, tuple[Lat, Lon].

A waypoint includes name, description, a time-of-last-update (TOLU), and display symbol to be used. It may also include a GUID to track name changes and assure uniqueness in spite of repeated names.

So far, so good. Nothing too edgy there. "Where's the problem?" you ask.

The problem is representation.

In GPX files, latitude and longitude are float values in degrees. You'll see this: <wpt lon="-80.22695124" lat="25.7154147">...</wpt>.

To do any useful computation, they need to be radians. Or a geocode that supports proximity comparisons, like OLC.

And. If you work with a CSV export from a tool like OpenCPN, then you get strings. This can be any combination of degrees and minutes or degrees, minutes, and seconds. And, depending on the software, there may be either ° or ยบ for the degrees. Can't tell the apart? One is U+00B0, the DEGREE SIGN. The other is U+00BA, the MASCULINE ORDINAL INDICATOR. Plus, of course, everyone uses apostrophe (') and quote (") where they should have used prime (′) and double prime (″). These are easy regular expression problems to solve.

This leads to a class like the following:

class Angle(float):
@classmethod def fromdegrees(cls, deg: float, hemisphere: Optional[str] = None) -> "Angle": ...
@classmethod def fromstring(cls, value: str) -> "Angle": ...

This Angle class converts numbers or strings into useful values; in radians internally. Formatted in degrees externally.  (And yes, this gets a warning from Python 3.9 that we can't usefully extend float like this.)

The problem is USR files. 

In USR files, they use millimeter mercator numbers for latitude and longitude. These are distances from the equator or the prime meridian. Because they're in millimeters, an integer will do nicely. A little computation is done to extract degrees (or radians) from these values.

SEMIMINOR_B = 6_356_752.3142

lon = round(math.degrees(mm_lon / SEMIMINOR_B), 8)
lat = round( math.degrees(2 * math.atan(math.exp(mm_lat / SEMIMINOR_B)) - math.pi / 2), 8 )

These aren't too bad. But.

Here's the question.

Where does this belong? Is it part of the class? It is separate?

Where does Millimeter Mercator representation belong?

This raises a secondary question: Where does ANY representation belong?

Do we separate the essential object (an angle in radians, a float) from all representation questions? If so, how do we properly bind value and representation at run time? 

Is our app full of complex mixins to bind the float with representation choices?  class Latitude(float, DMS, MM, etc.): pass. This seems potentially annoyingly complex: we have to make sure names don't collide, when defining all these aspects separately.

I think the representation for latitudes and longitudes *is* the essential problem here. The math (i.e. computing the loxodromic distance between points) is trivially separated from all of these representation concerns. 

If we buy into the centrality of representation issues, then, we're down to the following argument.

Resolution: millimeter mercator belongs in the Angle class.

Affirmative: it's yet another representation of an angle's value. 

Negative: it's not used outside USR files and belongs in the USR file parser module.

Affirmative Rebuttal: None of the other representations in Angle are tied specifically to a file format.

Negative Rebuttal: Because the other formats (float, string) are intermixed in CSV files and text displays, making them "widely used." While float is used consistently in GPX, this encoding is a pleasant exception that relies on widely-used encodings.

Okay. We seem to have conflicting goals here. Some representation is a generic thing that crosses file formats and some representation is localized to a specific file format and not reused.

The SOLID design principles don't help chose between these designs. Instead, they provide post-hoc justification for the design we chose.

We can exploit the SOLID principles in a variety of ways. Some Examples.

  • We could claim that LatitudeMM is a subclass of Latitude with the MM conversions mixed in. Open/Closed. Liskov Substitution. 
  • We could claim that Latitude has several load/dump strategies available, including Load from MM. Open/Closed. Dependency is Injected at run-time.

Sigh.

Prior Art

Methods like __str__() and __repr__() are generally considered part of the essential class. That means the most common string representations need to be provided. The parsing of a string, similarly, is the constructor for  an instance of the float class.

So. Some representations are part of the class. Clearly, however, not all representations are part of the class. Representation codecs like pickle, struct, or ctype are kept separate.

I'm going to make the case that there's a very, very fine line between unique and non-unique-but-not-widely-used aspects of a class of objects. And, in this specific case, the millimeter mercator should be kept separate.

I'm going to rely on other representations like PlusCode (also called OLC) as yet another obscure representation and insist these aren't essential to the class. Indeed, I'm going to suggest that proximity-friendly geocoding is clearly separate because it's a hack to replace complex distance computations with substring comparisons. 

Tuesday, June 1, 2021

Real Math (symbolic math, like mathematicians do) and a spreadsheet-like feedback loop

See https://slott56.github.io/replacing-a-spreadsheet/. This document is really exciting (to me).

This is still shaky -- I'm still learning -- but it's a very cool combination of Python components sympy and Jupyter Lab. As a bonus, Jupyter{Book} appeals to me as a writer. There's an aspect of literate programing in this that is also very appealing.

The core is this.

  • I have a problem that involves complex math. Well, it's complex to me. It involves integrals, so there's a lot of space for confusion.
  • This is applied math, and I want to plug in numbers and get answers. 

In effect, I want a spreadsheet.

I don't want rows-and-columns. I do want cells, though, that's a nice organizing principle.

I don't want the goofy little formulas in a spreadsheet. I want real Python code.

I want the spreadsheet-like feature of computations that depend on inputs and are re-run when the inputs change. This has been the core value proposition for spreadsheets since the days of VisiCalc. It's a great UX in general. We just need to get past the rows-and-columns.

The problem with most spreadsheet apps is the limited capability for more serious math. 

Which is why the sympy + Jupyter Lab was a blinding revelation to me.

Tuesday, May 25, 2021

Python's Protocol Annotation vs. Duck Typing

Let's talk about profound confusion.

I got an email with a subject of this, "Python's Protocol Reduces Reliance on Duck Typing". The resulting conversation led to this nugget: "... my current project could use protocols in Python, and thus I didn't need to rely on duck typing and instead could use that as my type."

I'm unclear on what "reliance" really meant here.

Python depends (heavily) on duck typing. Because type annotations are optional, this cannot change. It's unlikely to ever change.

Here's the bottom line: Duck Typing Won't Go Away.

Indeed, there's more: Duck Typing Isn't Bad.

Python doesn't "rely" on the type annotations. They're a bonus feature to make sure you aren't lying about the types and how they're used.

Protocols are how duck typing works. When we leverage duck typing among classes, we're implicitly relying on the classes all supporting a common protocol. Numbers, for example, implement a ton of methods; this collection of common methods (e.g., __add__(), etc.) define a protocol.

With mypy, we can create our own distinct protocols as named types.

I don't get the "reducing reliance" business when protocols make duck typing work. And. Sadly. I couldn't figure out where the confusion arose.  

Follow-up

I asked for clarification and got nothing useful in response. The person sending the email seemed to be working from a summary of another conversation, or something. I couldn't figure it out.

I can try to assume they used to have this.

class Something:
    def useful_method(self, x: str) -> int:
        # whatever
        
class CloselyRelated:
    def useful_method(self, x: str) -> int:
        # another polymorphic thing
        
Polymorphic = Union[Something, CloselyRelated]

# many classes and functions relying on Polymorphic

And they've realized that there may be a better way.

But. I haven't really got much to go on.

The better approach often involves something like this:

class Polymorphic(Protocol): 
    def useful_method(self, x: str) -> int:
        ...

We can define a protocol to help locate the essential features of a parameter or a result type.

But. I don't really know what was going on.

And I couldn't figure out why the word "Reliance" was used.

Tuesday, April 20, 2021

OpenMarine and Signal-K

 I heard about these less than a week ago.

This is very interesting. Very interesting.

I have a partially complete IoT anchor alarm.

The idea of leveraging the boat's other devices through a Signal-K interface is appealing.

The problem is, I don't want the power draw.

I think I want to continue on the path of a small, thrifty, stand-alone device that can be used when the rest of the boat's systems are mostly shut down.

However. Seeing these projects causes me to rethink my use of Arduino and C++. While the Arduino is the thriftiest possible device -- the power consumption is negligible -- I think that a small upgrade to a Python-based device might make the software a tiny bit simpler. 

An Arduino Uno, specifically, is just barely capable of the UX I was hoping to build. The two-line LCD with a "mark" push-button, an "anchor circle" knob, and a "display page" button is right at the limit; I'm using analog inputs instead of digital for the buttons.

A Raspberry Pi can support more sophisticated displays, at some cost in power consumption. An e-ink display might be a better choice than the two-line LCD because -- well -- anchoring details change slowly.  Once you've drifted too far (or have a consistent COG away from the marked point with a steadily growing distance) then the alarm sounds and the display is more-or-less irrelevant. You're going to get up, and eyeball the situation to see what's going on. Wether or not the display updates doesn't matter much. 

We haven't drifted very often, so I don't have too much data.

  1. Once we slid to a new position. It was a stormy, blowy day. We eased out more scope. And we never moved again. We stood on deck, taking visual bearings. An e-ink display of the details wasn't what we depended on.
  2. We used to rely on an iPhone app to monitor our position. (We've switched to using SafeAnchor, we used to use an older app, no longer available.) We were moving slowly, but steadily. It was during a hurricane, we weren't surprised. We started the engine, raised the anchor and motored to a new place to reset. Again, we weren't using the display on the phone, we were looking at Pungo creek.
  3. And once we were not on the boat when she moved. That would have been awkward for our neighbors. So. We'd need to have a "reset the anchor alarm" switch in the cockpit. This would mark a new position. Fatty Goodlander's advice is to leave a big sign with a string showing them where it is.

A full 1.0 knot of speed is 1.7 feet per second. We often have 50 feet of line, meaning any movement under 100' is likely ordinary boat motion. That means 30 seconds until we're suspicious of a problem, and a full minute before we must sound the alarm. (In the middle, a constant COG and increasing distance is a leading indicator of trouble; alarm chirps might be helpful.)

As intermediate data gathering format, the Signal-K data stream is appealing. It steps away from the NMEA GPS talker messages. It's heavy-going for an Arduino Uno. But. Might work out well on something a little bigger.

Tuesday, April 6, 2021

A 5-point framework for Python performance management

See A 5-Point Framework For Python Performance Management.

It seems straight-forward to me. Have goals. Measure your ability to meet them.

I don't see too many teams doing this, though. 

I could be wrong, but, I think performance is left to arguments and complaints, not solid engineering.

Tuesday, March 9, 2021

Recommended Books

I get asked about good books for beginners. Here's an example:

"What Python books do you recommend for novices so they can learn from beginner to advanced?"

For me, this is nearly impossible to answer.

"Beginner" is often undefined. I have to turn this around and ask what you already know about -- well-- everything. Computing. Programming. Languages. etc. etc. 

"Advanced" similarly is undefined. Most folks have areas they're interested in. Machine Learning. IoT. Security. Cloud Engineering. Graphics. Games. Sound. etc. et yet even more ceteras.

And -- even more fascinating to me -- where are you on that journey? What have you done so far?

I'm am (overly) sensitive to being a Personal Search Concierge, PSC™. 

I know people who (actually) cannot make Google work. Seriously. Utterly unable to use it. I believe that they are incapable of reframing their question with synonyms, but instead insist on typing a single thing into the search bar, and if the first promoted response in the list of advertisements doesn't literally answer their question, they email me.

This leads me to a stammering stupidity when asked about Python books.

Yes, I'm an author. Yes, I read other books. But no, I don't think I can answer your question.

One possible non-answer: https://realpython.com/best-python-books/. Start here.

What does "advanced" mean?

Most of the Python experts I know are experts at applying Python to a problem domain. In rare cases, the problem domain is Python itself, but even then, the focus often narrows to a specific package in the standard library, or an aspect of the run-time.

In the process of solving problems with Python, most people tend to learn a fair amount of the language. I work with folks who are fabulous problem-solvers but who'll sometimes be surprised by a Python feature that's outside their already broad experience. 

What's central here is that they're apply Python to something. The thing that seems to distinguish novices from experts is the pursuit of a solution to a problem, and learning Python as part of solving the problem.

It's essential, then, to have a problem about which one is passionate. Given a problem, and passion to solve that problem, expertise will grow.

So that's my other possible non-answer: find a problem you're passionate about and apply Python to solving it.

And yes, that's not a book. Books can help with understanding the problem or working out a solution in Python. Rarely does one book do both.

A good friend of mine's Python expertise comes from arranging the metadata in thousands of photographs on his computer. Apple's photos app has gone through numerous changes, and his photo library had become a jumble of obsolete folders, no longer supported by the current app. So they mastered Python and Apple's scripting tools, and Photos, and Mac OS X to arrange their photos. 

There are many Civic Tech organizations like the Code for America where you can confront large, complex problems, and build tech skills while helping solve a real-world problem. 

Another possible non-answer: https://www.govwebworks.com/2018/12/03/investigating-the-civic-tech-movement/

Everyone's journey is unique.