S.Lott-Software Architect: June 2017

Tuesday, June 27, 2017

OOP and FP -- Objects vs. Functional -- Avoiding reductionist thinking

Real Quote (lightly edited to remove tangential nonsense.)

Recently, I watched a video and it stated that OO is about nouns and Functional programming is about the verbs. Also, ... Aspect-Oriented Programming with the e Verification Language by David Robinson

It would be nice to have a blog post which summarized the various mindset associated w/ the various paradigms.

Sigh.

I find the word "mindset" to be challenging.

Yes. All Turing Complete programming languages do have a kind of fundamental equivalence at the level of computing stuff represented as numbers. This, however, seems reductionist.

["All languages were one language to him. All languages were 'woddly'." Paraphrased from James Thurber's "The Great Quillow", a must-read.]

So. Beyond the core Turing Completeness features of a language, the rest is reduced to a difference in "mindset"? The only difference is how we pronounce "Woddly?"

"Mindset" feels reductionist. It replaces a useful summary of language features with a dismissive "mindset" categorization of languages. In a way, this seems to result from bracketing technology choices as "religious wars," where the passion for a particular choice outweighs the actual relevance; i.e., "All languages have problems, so use Java."

In my day job, I work with three kinds of Python problems:

Data Science
API Services
DevOps/TechOps Automation

In many cases, one person can have all three problems. These aren't groups of people. These are problem domains.

I think the first step is to replace "mindset" with "problem domain". It's a start, but I'm not sure it's that simple.

When someone has a data science problem, they often solve it with purely function features of Python. Generally they do this via numpy, but I've been providing examples of generator expressions and comprehensions in my Code Dojo webinars. Generator expressions are an elegant, functional approach to working with stateless data objects.

In Python 3, the following kind of code doesn't create gigantic intermediate data structures. The functional transformations are applied to each item generated by the "source".

x = map(transform, source)
y = filter(selector_rule, x)
z = Counter(y)

I prefer to suggest that a fair amount of data analysis involves little or no mutation of state. Functional features of a language seem to work well with immutable data.

There is state change, but it's at a macro level. For example, the persistence of capturing data is a large-scale state change that's often implemented at the OS level, not the language level.

When someone's building an API, on the other hand, they're often working with objects that have a mutating state. Some elements of an API will involve state change, and objects model state change elegantly. RESTful API's can deserialize objects from storage, make changes, and serialize the modified object again.

[This summary of RESTful services is also reductionist, and therefore, possibly unhelpful.]

When there's mutability, then objects might be more appropriate than functions.

I'm reluctant to call this "mindset." It may not be "problem domain." It seems to be a model that involves mutable or immutable state.

When someone's automating their processing, they're wrestling with OS features, and OS management of state change. They might be installing stuff, or building Docker images, or gluing together items in a CI/CD pipeline, setting the SSL keys, or figuring out how to capture Behave output as part of Gherkin acceptance testing. Lots of interesting stuff that isn't the essential problem at hand, but is part of building a useful, automated solution to the problem.

The state in these problems is maintained by the OS. Application code may -- or may not -- try to model that state.

When doing Blue/Green deployments, for example, the blueness and greenness isn't part of the server farm, it's part of an internal model of how the servers are being used. This seems to be stateful; object-oriented programming might be helpful. When the information can be gleaned from asset management tools, then perhaps a functional processing stream is more important for gathering, deciding, and taking action.

I'm leaning toward the second view-point, and suggesting that some of the OO DevOps programming might be better looked at as functional map-filter-reduce processing. Something like

action_to_take = some_decision_pipeline(current state, new_deployment)

This reflects the questions of state change. It may not be the right abstraction though, because carrying out the action is, itself, a difficult problem that involves determining the state of the server farm, and then applying some change to one or more servers.

We often think of server state change as imperative in nature. It feels like object-oriented programming. There are steps, the object models those steps. I'm not sure that's right. I think there's a repeated "determine next action" cycle here. Sometimes it involves waiting for an action to finish. Yes, it sounds like polling the server farm. I'm not sure that's wrong. How else do you know a server is crashed except by polling it?

I think we've moved a long way from "mindset."

I think it's about fitting language features to a problem in a way that creates the right abstraction to capture (and eventually) solve the problem.

I haven't mentioned Aspect-Oriented Programming because it seems to cut across the functional/object state management boundary. It's a distinctive approach to organizing reusable functionality. I don't mean to dismiss it as uninteresting. I mean to set it aside as orthogonal to the "mutable state" consideration that seems to to be one of the central differences between OOP and FP.

In response to the request: "No. I won't map mindset to paradigm."

Tuesday, June 20, 2017

NMEA Data Acquisition -- An IoT Exercise with Python

Here's the code: https://github.com/slott56/NMEA-Tools. This is Python code to do some Internet of Things (IoT) stuff. Oddly, even when things connected by a point-to-point serial interface, it's still often called IoT. Even though there's no "Internetworking."

Some IoT projects have a common arc: exploration, modeling, filtering, and persistence. This is followed by the rework to revise the data models and expand the user stories. And then there's the rework conundrum. Stick with me to see just how hard rework can be.

What's this about? First some background. Then I'll show some code.

Part of the back story is here: http://www.itmaybeahack.com/TeamRedCruising/travel-2017-2018/that-leaky-hatch--chartplot.html

In the Internet of Things Boaty (IoT-B) there are devices called chart-plotters. They include GPS receivers, displays, and controls. And algorithms. Most important is the merging of GPS coordinates and an active display. You see where your boat is.

Folks with GPS units in cars and on their phones have an idea core feature set of a chart plotter. But the value of a chart plotter on a boat is orders of magnitude above the value in a car.

At sea, the hugeness and importance of the chartplotter is magnified. The surface of the a large body of water is (almost) trackless. Unless you're really familiar with it, it's just water, generally opaque. The depths can vary dramatically. A shoal too shallow for your boat can be more-or-less invisible and just ahead. Bang. You're aground (or worse, holed.)

A chart -- and knowledge of your position on that chart -- is a very big deal. Once you sail out of sight of land, the chart plotter becomes a life-or-death necessity. While I can find the North American continent using only a compass, I'm not sure I could find the entrance to Chesapeake Bay without knowing my latitude. (Yes, I have a sextant. Would I trust my life to my sextant skills?)

Modern equipment uses modern hardware and protocols. N2K (NMEA 2000), for example, is powered Ethernet connectivity that uses a simplified backbone with drops for the various devices. Because it's Ethernet, they're peers, and interconnection is simplified. See http://www.digitalboater.com for some background.

The Interface Issue

The particularly gnarly problem with chart plotters is the lack of an easy-to-live-with interface.

They're designed to be really super robust, turn-it-on-and-it-works products. Similar to a toaster, in many respects. Plug and play. No configuration required.

This is a two-edged sword. No configuration required bleeds into no configuration possible.

The Standard Horizon CP300i uses NT cards. Here's a reader device. Note the "No Longer Available" admonition. All of my important data is saved to the NT card. But. The card is useless except for removable media backup in case the unit dies.

What's left? The NMEA-0183 interface wiring.

NMEA Serial EIA-422

The good news is that the NMEA wiring is carefully documented in the CP300i owner's manual. There are products like this NMEA-USB Adaptor. A few wire interconnections and we can -- at least in principle -- listen to this device.

The NMEA standard was defined to allow numerous kinds of devices to work together. When it was adopted (in 1983), the idea was that a device would be a "talker" and other devices would be "listeners." The intent was to have a lot of point-to-point conversations: one talker many listeners.

A digital depth meter or wind meter, for example, could talk all day, pushing out message traffic with depth or wind information. A display would be a listener and display the current depth or wind state.

A centralized multiplexer could collect from multiple listeners and then stream the interleaved messages as a talker. Here's an example. This would allow many sensors to be combined onto a single wire. A number of display devices could listen to traffic on the wire, pick out messages that made sense to them, and display the details.

Ideally, if every talker was polite about their time budget, hardly anything would get lost.

In the concrete case of the CP300i, there are five ports. usable in various combinations. There are some restrictions that seem to indicate some hardware sharing among the ports. The product literature describes a number of use cases for different kinds of interconnections including a computer connection.

Since NMEA is EIA-422 is RS-232, some old computer serial ports could be wired up directly. My boat originally had an ancient Garmin GPS and an ancient windows laptop using an ancient DB-9 serial connector. I saved the data by copying files off the hard drive and threw the hardware away.

A modern Macintosh, however, only handles USB. Not direct EAI-422 serial connections. An adaptor is required.

What we will have, then, is a CP300i in talker mode, and a MacBook Pro (Retina, 13-inch, Late 2013) as listener.

Drivers and Infrastructure

This is not my first foray in the IoT-B world. I have a BU-353 GPS antenna. This can be used directly by the GPSNavX application on the Macintosh. On the right-ish side of the BU-353 page are Downloads. There's a USB driver listed here. And a GPS Utility to show position and satellites and the NMEA data stream.

Step 1. Install this USB driver.

Step 2. Install the MAC OS X GPS Utility. I know the USB interface works because I can see the BU-353 device using this utility.

Step 3. Confirm with GPSNavX. Yes. The chart shows the little boat triangle about where I expect to be.

Yay! Phase I of the IoT-B is complete. We have a USB interface. And we can see an NMEA-0183 GPS antenna. It's transmitting in standard 4800 BAUD mode. This is the biggest hurdle in many projects: getting stuff to talk.

In the project Background section on Git Hub, there's a wiring diagram for the USB to NMEA interface.

Also, the Installation section says install pyserial. https://pypi.python.org/pypi/pyserial. This is essential to let Python apps interact with the USB driver.

Data Exploration

Start here: NMEA Reference Manual. This covers the bases for the essential message traffic nicely. The full NMEA standard has lots of message types. We only care about a few of them. We can safely ignore the others.

As noted in the project documentation, there's a relatively simple message structure. The messages arrive more-or-less constantly. This leads to an elegant Pythonic design: an Iterator.

We can define a class which implements the iterator protocol (__iter__() and __next__()) that will consume lines from the serial interface and emit the messages which are complete and have a proper checksum. Since the fields of a message are comma-delimited, might as well split into fields, also.

It's handy to combine this with the context manager protocol (__enter__() and __exit__()) to create a class that can be used like this.

    with Scanner(device) as GPS:
        for sentence_fields in GPS:
            print(sentence_fields)

This is handy for watching the messages fly past. The fields are kind of compressed. It's a light-weight compression, more like a lack of useful punctuation than proper compression.

Consequently, we'll need to derive fields from the raw sequences of bytes. This initial exploration leads straight to the next phase of the project.

Modeling

We can define a data model for these sentences using a Sentence class hierarchy. We can use a simple Factory function to emit Sentence objects of the appropriate subclass given a sequence of fields in bytes. Each subclass can derive data from the message.

The atomic fields seem to be of seven different types.

Text. This is a simple decode using ASCII encoding.
Latitude. The values are in degrees and float minutes.
Longitude. Similar to latitude.
UTC date. Year, month, and day as a triple.
UTC time. Hour, minute, float seconds as a triple.
float.
int.

Because fields are optional, we can't naively use the built-in float() and int() functions to convert bytes to numbers. We'll have to have a version that works politely with zero-length strings and creates None objects.

We can define a simple field definition tuple, Field = namedtuple('Field', ['title', 'name', 'conversion']). This slightly simplifies definition of a class.

We can define a class with a simple list of field conversion rules.

class GPWPL(Sentence):
    fields = [
        Field('Latitude', 'lat_src', lat),
        Field('N/S Indicator', 'lat_h', text),
        Field('Longitude', 'lon_src', lon),
        Field('E/W Indicator', 'lon_h', text),
        Field("Name", "name", text),        
    ]

The superclass __init__() uses the sequence of field definitions to apply conversion functions (lat(), lon(), text()) to the bytes, populating a bunch of attributes. We can then use s.lat_src to see the original latitude 2-tuple from the message. A property can deduce the actual latitude from the s.lat_src and s.lat_h fields.

For each field, apply the function to the value, and set this as an attribute.

        for field, arg in zip(self.fields, args[1:]):
            try:
                setattr(self, field.name, field.conversion(arg))
            except ValueError as e:
                self.log.error(f"{e} {field.title} {field.name} {field.conversion} {arg}")

This sets attributes with useful values derived from the bytes provided in the arguments.

The factory leverages a cool name-to-class mapping built by introspection.

    sentence_class_map = {
        class_.__name__.encode('ascii'): class_ 
        for class_ in Sentence.__subclasses__()
    }
    class_= self.sentence_class_map.get(args[0])

This lets us map a sentence header (b"GPRTE") to a class (GPRTE) simply. The get() method can use an UnknownSentence subclass as a default.

Modeling Alternatives

As we move forward, we'll want to change this model. We could use a cooler class definition style, something like this. We could then iterate of the keys in the class __dict__ to set the attribute values.

class GPXXX(Sentence):
    lat_src = Latitude(1)
    lat_h = Text(2)
    lon_src = Longitude(3)
    lon_h = Text(4)
    name = Text(5)

The field numbers are provided to be sure the right bunch of bytes are decoded.

Or maybe even something like this:

class GPXXX(Sentence):
    latitude = Latitude(1, 2)
    longitude = Longitude(3, 4)
    name = Text(5)

This would combine source fields to create the useful value. It would be pretty slick. But it requires being *sure* of what a sentence' content is. When exploring, this isn't the way to start. The simplistic list of field definitions comes right off web sites without too much intermediate translation that can lead to confusion.

The idea is to borrow the format from the SiRF reference and start with Name, Example, Unit, and Description in each Field definition. That can help provide super-clear documentation when exploring. The http://aprs.gids.nl/nmea/ information has similar tables with examples. Some of the http://freenmea.net/docs examples only have names.

The most exhaustive seems to be http://www.catb.org/gpsd/NMEA.html. This, also, only has field names and position numbers. The conversions are usually pretty obvious.

Filtering

A talker -- well -- talks. More or less constantly. There are delays to allow time to listen and time for multiplexers to merge in other talker streams.

There's a cycle of messages that a device will emit. Once you've started decoding the sentences, the loop is obvious.

For an application where you're gathering real-time track or performance data, of course, you'll want to capture the background loop. It's a lot of data. At about 80 bytes times 8 background messages on a 2-second cycle, you'll see 320 bytes per second, 19K per minute, 1.1M per hour, 27.6M per day. You can record everything for 38 days to and be under a Gb.

The upper bound for 4800 BAUD is 480 bytes per second. 41M per day. 25 days to record a Gb of raw data.

For my application, however, I want to capture the data not in the background loop.

It works like this.

I start the laptop collecting data.
I reach over to the chartplotter and push a bunch of buttons to get to a waypoint transfer or a route transfer.
The laptop shows the data arriving. The chartplotter claims it's done sending.
I stop collecting data. In the stream of data are my waypoints or routes. Yay!

A reject filter is an easy thing: Essentially it's filter(lambda s: s._name not in reject_set, source). A simple set of names to reject is the required configuration for this filter.

Persistence

How do we save these messages?

We have several choices.

Stream of Bytes. The protocol uses \r\n as line endings. We could (in principle) cat /dev/cu.usbserial-A6009TFG >capture.nmea. Pragmatically, that doesn't always work because the 4800 BAUD setting is hard to implement. But the core idea of "simply save the bytes" works.
Stream of Serialized Objects.

We can use YAML to cough out the objects. If the derived attributes were all properties, it would have worked out really well. If, however, we leverage __init__() to set attributes, this becomes awkward.
We can work around the derived value problems by using JSON with our own Encoder to exclude the derived fields. This is a bit more complex, than it needs to be. It permits exploration though.

GPX, KML, or CSV. Possible, but. These seems to be a separate problem.

When transforming data, it's essential to avoid "point-to-point" transformation among formats. It's crucial to have a canonical representation and individual converters. In this case, we have NMEA to canonical, persist the canonical, and canonical to GPX (or KML, or CSV.)

Rework

Yes. There's a problem here. Actually there are several problems.

I got the data I wanted. So, fixing the design flaws isn't essential anymore. I may, but... I should have used descriptors.
In the long run, I really need a three-way synchronization process between computer, new chart plotter and legacy chart plotter.

Let's start with the first design issue: lazy data processing.

The core Field/Sentence design should have looked like this:

class Field:
    def __init__(self, position, function, description):
        self.position = position
        self.function = function
        self.description = description
    def __get__(self, object, class_):
        print(f"get {object} {class_}")
        transform = self.function
        return transform(object.args[self.position])
        
class Sentence:
    f0 = Field(0, int, "Item Zero")
    f1 = Field(1, float, "Item One")
    def __init__(self, *args):
        self.args = args

This makes all of the properties into lazy computations. It simplifies persistence because the only real attribute value is the tuple of arguments captured from the device.

>>> s = Sentence(b'1', b'2.3')
>>> s.f1
1
>>> s.f2
2.3

That would have been a nicer design because serialization would have been trivial. Repeated access to the fields might have become costly. We have a tradeoff issue here that depends on the ultimate use case. For early IoT efforts, flexibility is central, and the computation costs don't matter. At some point, there may be a focus on performance, where extra code to save time has merit.

Synchronization is much more difficult. I need to pick a canonical representation. Everything gets converted to a canonical form. Differences are identified. Then updates are created: either GPX files for the devices that handle that, or NMEA traffic for the device which updated over the wire.

Conclusion

This IoT project followed a common arc: Explore the data, define a model, figure out how to filter out noise, figure out how to persist the data. Once we have some data, we realize the errors we made in our model.

A huge problem is the pressure to ship an MVP (Minimally Viable Product.) It takes a few days to build this. It's shippable.

Now, we need to rework it. In this case, throw most of the first release away. Who has the stomach for this? It's essential, but it's also very difficult.

A lot of good ideas from this blog post are not in the code. And this is the way a lot of commercial software happens: MVP and move forward with no time for rework.

Friday, June 16, 2017

Python Resources

https://opensource.com/tags/python

Red Hat was an early adopter, putting Python-based tools into their distros.

Sadly, they've also lagged behind. They haven't gotten much beyond Python 2.6 or maybe 2.7.

RHEL is really good. Except for the Python problem.

Tuesday, June 13, 2017

Another "Problems With Python" List

This: https://darkf.github.io/posts/problems-i-have-with-python.html

Summary.

Slow.
Threads.
Legacy Python 2 Code.
Some Inconsistencies.
Functional Programming.
Class Definitions Don't Have Enough Features.
Switch Statement.

Responses.

Find the 20% that needs a speedup and write that in C. Most of the time that's already available in numpy. The rest of the time you may have found something useful.
Generally, most folks make a hash of threaded programming. A focus on process-level parallelism is simpler and essentially guarantees success by avoiding shared data structures.
Maybe stop using the legacy projects?
Yup.
Proper functional programming would requires an optimizer, something that doesn't fit well with easy-to-debug Python. It would also require adding some features to cope the optimization of functional code (e.g., monads.) It seems to be a net loss. And http://coconut-lang.org.
Consider a metaclass that provides the missing features?
I can't figure out why 'elif' is considered hard to use. The more complex matching rules are pretty easy to implement, but I guess this falls into the "awful hacks" category.

What causes me to write this is the lack of concrete "do this instead" for most of the points. It sounds too much like empty complaining.

I hope for some follow-on from this on Twitter:

"Problems I Have With Python" (from a long-time Python user) https://t.co/CWYTdsge70
— Dan Bader (@dbader_org) May 16, 2017

But I'm not optimistic. It's too easy to complain and too hard to solve.

Friday, June 9, 2017

Some Important Reading

Several things:

Generating 64 bit hash collisions to DOS Python by Robert Grosse

	Dan Bader (@dbader_org)
6/6/17, 9:14 PM Python job scheduling made simple: schedule.readthedocs.io

Tuesday, June 6, 2017

An Epic Fail Example

What's the most Epic Fail I've ever seen?

I was a traveling consulting for almost 35 years. I saw a lot. I did learn from epic fail scenarios. But. I haven't really spent a lot of time thinking about the lessons learned there. I never have a glib answer to this question. Mostly because the stories are incomplete: I came in during a awful mess and left and it was still an awful mess. No arc. No third act. No punchline.

These aren't really stories as much as they're vignettes -- just sad fragments of some larger tragedy. Consequently, they don't leap to front of mind quickly.

One example is a smallish company that had built some pretty cool software in MS-Access. They had created something that was narrowly focused on a business problem and they were clever, so it worked. And worked well.

They leveraged this success, solving another major business problem. In MS-Access. Clever. Focused on real user's real needs. It doesn't get any better than that.

Well, of course, it does get better than that.

They replicated their success seven times. Seven interlocking MS-Access databases. They had subsumed essentially all of the company's information and data processing. Really. It's not that hard to do. Companies buy General Ledger software that doesn't really do very much. You can write a perfectly serviceable ledger application yourself. (Many people ask "why bother?")

When I talked with them they had finally been swamped by the inevitable scalability problem. They had done all the hackarounds they could do. Their network of MS-Access servers and interlocked cluster of databases had reached it's limit of growth.

Unsurprisingly.

Questions I did not ask at the time: Who let this happen? Who closed their eyes to the scalability problem and let this go forward? Who avoided the idea of contingency planning? How do you back this up and restore it to a consistent state?

They were in a world of trouble. I told them what they had to do and never saw them again. End of vignette.

(In case you want to know... I told them to get a real server, install SQL-Server, and migrate each individual MS-Access table to that central SQL-Server database, replacing the Access table with an ODBC connection to the central DB. This would take months. Once every single database was expunged from MS-Access, they could start to look at a web-based front-end to replace the Access front-end.)

There are others. I'll have to ransack my brain to see if I've got other examples.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.