Tuesday, October 17, 2017

Why I like Functional Composition

After spending years developing a level of mastery over Object Oriented Design Patterns, I'm having a lot of fun understanding Functional Design Patterns.

The OO Design Patterns are helpful because they're concrete expressions of the S. O. L. I. D. design principles. Much of the "Gang of Four" book demonstrates the Interface Segregation, Dependency Injection, and Liskov Substitution Principles nicely. They point the way for implementing the Open/Closed and the Single Responsibility Principles.

For Functional Programming, there are some equivalent ideas, with distinct implementation techniques. The basic I, D, L, and S principles apply, but have a different look in a functional programming context. The Open/Closed principle takes on a radically different look, because it turns into an exercise in Functional Composition.

I'm building an Arduino device that collects GPS data. (The context for this device is the subject of many posts coming in the future.)

GPS devices generally follow the NMEA 0183 protocol, and transmit their data as sentences with various kinds of formats. In particular, the GPRMC and GPVTG sentences contain speed over ground   (SOG) data.

I've been collecting data in my apartment. And it's odd-looking. I've also collected data on my boat, and it doesn't seem to look quite so odd. Here's the analysis I used to make a more concrete conclusion.

def sog_study(source_path = Path("gps_data_gsa.csv")):
    with source_path.open() as source_file:
        rdr = csv.DictReader(source_file)
        sog_seq = list(map(float, filter(None, (row['SOG'] for row in rdr))))
        print("max {}\tMean {}\tStdev {}".format(
            max(sog_seq), statistics.mean(sog_seq), statistics.stdev(sog_seq)))

This is a small example of functional composition to build a sequence of SOG reports for analysis.

This code opens a CSV file with data extracted from the Arduino. There was some reformatting and normalizing done in a separate process: this resulted in a file in a format suitable for the processing shown above.

The compositional part of this is the list(map(float, filter(None, generator))) processing.

The (row['SOG'] for row in rdr) generator can iterate over all values from the SOG column. The filter(None, generator) will drop all None objects from the results, assuring that irrelevant sentences are ignored.

Given an iterable that can produce SOG values, the map(float, iterable) will convert the input strings into useful numbers. The surrounding list() creates a concrete list object to support summary statistics computations.

I'm really delighted with this kind of short, focused functional programming.

"But wait," you say. "How is that anything like the SOLID OO design?"

Remember to drop the OO notions. This is functional composition, not object composition.

ISP: The built-in functions all have well-segregated interfaces. Each one does a small, isolated job.

LSP: The concept of an iterable supports the Liskov Substitution Principle: it's easy to insert additional or different processing as long as we define functions that accept iterables as an argument and yield their values or return an iterable result.

For example.

def sog_gen(csv_reader):
    for row in csv_reader:
        yield row['SOG']

We've expanded the generator expression, (row['SOG'] for row in rdr), into a function. We can now use sog_gen(rdr) instead of the generator expression. The interfaces are the same, and the two expressions enjoy Liskov Substitution.

To be really precise, annotation with type hints can clarify this.  Something like sog_gen(rdr: Iterable[Dict[str, str]]) -> Iterable[str] would clarify this.

DIP: If we want to break this down into separate assignment statements, we can see how a different function can easily be injected into the processing pipeline. We could define a higher-order function that accepted functions like sog_gen, float, statistics.mean, etc., and then created the composite expression.

OCP: Each of the component functions is closed to modification but open to extension. We might want to do something like this: map_float = lambda source: map(float, source). The map_float() function extends map() to include a float operation. We might even want to write something like this.  map_float = lambda xform, source: map(xform, map(float, source)). This would look more like map(), with a float operation provided automatically.

SRP: Each of the built-in functions does one thing. The overall composition builds a complex operation from simple pieces.

The composite operation has two features which are likely to change: the column name and the transformation function. Perhaps we might rename the column from 'SOG' to 'sog'; perhaps we might use decimal() instead of float(). There are a number of less-likely changes. There might be a more complex filter rule, or perhaps a more complex transformation before computing the statistical summary.  These changes would lead to a different composition of the similar underlying pieces.

Tuesday, October 10, 2017

Python Exercises


This seems very cool. These look like some pretty cool problems. It includes debugging and unit testing, so there's a lot of core skills covered by these exercises.

Thursday, September 28, 2017

Learning to Code

I know folks who struggle with the core concepts of writing software.

Some of them are IT professionals. With jobs. They can't really code. It seems like they don't understand it.

Maybe a gentler introduction to programming will help?

I have my doubts. The folks who seem to struggle the hardest are really fighting against their own assumptions. They seem to make stuff up and then seek confirmation in everything they do. The idea of a falsifiable experiment seems to be utterly unknown to them. Also, because they're driven by their assumptions, the idea of exhaustively enumerating alternatives isn't something they do well, either.

For example, if you try to explain python's use of " or ' for string literals -- a syntax not used by a language like SQL -- they will argue that Python is "wrong" based on their knowledge of SQL. Somehow they wind up with a laser-like focus on mapping Python to SQL. They'll argue that apostrophe's are standard, and they'll always use those. Problem solved, right?

Or is it problem ignored? Or problem refused?

And. Why the laser-like focus on mapping among programming languages? It seems that they're missing the core concept of abstract semantics mapped to specific syntax.

Tuesday, September 26, 2017

Learning About Data Science.

I work with data scientists. I am not a scientist.

This kind of thing on scikit learn is helpful for understanding what they're trying to do and how I can help.

Tuesday, September 19, 2017

Three Unsolvable Problems in Computing

The three unsolvable problems in computing:

  • Naming
  • Distributed Cache Coherence
  • Off-By-One Errors

Let's talk about naming.

The project team decided to call the server component "FlaskAPI".


It serves information about two kinds of resources: images and running instances of images. (Yes, it's a kind of kubernetes/dockyard lite that gives us a lot of control over servers with multiple containers.)

The feature set is growing rapidly. The legacy name needs to change. As we move forward, we'll be adding more microservices. Unless they have a name that reflects the resource(s) being managed, this is rapidly going to become utterly untenable.

Indeed, the name chosen may already be untenable: the name doesn't reflect the resource, it reflects an implementation choice that is true of all the microservices. (It's a wonder they didn't call it "PythonFlaskAPI".)

See https://blogs.mulesoft.com/dev/api-dev/best-practices-for-building-apis/ for some general guidelines on API design.

These guidelines don't seem to address naming in any depth. There are a few blog posts on this, but there seem to be two extremes.

  • Details/details/details. Long paths: class-of-service/service/version-of-service/resources/resource-id kind of paths. Yes. I get it. The initial portion of the path can then route the request for us. But it requires a front-end request broker or orchestration layer to farm out the work. I'm not enamored of the version information in the path because the path isn't an ontology of the entities; it becomes something more and reveals implementation details. The orchestration is pushed down the client. Yuck.
  • Resources/resource. I kind of like this. The versioning information can be in the Content-Type header: application/json+vnd.yournamehere.vx+json.  I like this because the paths don't change. Only the vx in the header. But how does the client select the latest version of the service if it doesn't go in the path? Ugh. Problem not solved.
I'm not a fan of an orchestration layer. But there's this: https://medium.com/capital-one-developers/microservices-when-to-react-vs-orchestrate-c6b18308a14c  tl;dr: Orchestration is essentially unavoidable.

There are articles on choreography. https://specify.io/concepts/microservices the idea is that an event queue is used to choreograph among microservices. This flips orchestration around a little bit by having a more peer-to-peer relationship among services. It replaces complex orchestration with a message queue, reducing the complexity of the code.

On the one hand, orchestration is simple. The orchestrator uses the resource class and content-type version information to find the right server. It's not a lot of code.

On the other hand, orchestration is overhead. Each request passes through two services to get something done. The pace of change is slow. HATEOAS suggests that a "configuration" or "service discovery" service (with etags to support caching and warning of out-of-date cache) might be a better choice. Clients can make a configuration request, and if cache is still valid, it can then make the real working request.

The client-side overhead is a burden that is -- perhaps -- a bad idea. It has the potential to make  the clients very complex. It can work if we're going to provide a sophisticated client library. It can't work if we're expecting developers to make RESTful API requests to get useful results. Who wants to make the extra meta-request all the time?

Tuesday, September 12, 2017

The No Code Approach to Software and Why It Might Be Bad

Start here: https://www.forbes.com/sites/jasonbloomberg/2017/07/20/the-low-codeno-code-movement-more-disruptive-than-you-realize/#98cfc4a722a3

I'm not impressed. I have been not impressed for 40 years and many previous incarnations of this idea of replacing code with UX.

Of course, I'm biased. I create code. Tools that remove the need to create code reflect a threat.

Not really, but my comments can be seen that way.

Here's why no code is bad.

Software Captures Knowledge

If we're going to represent knowledge in the form of software, then, we need to have some transparency so that we can see the entire stack of abstractions. Yes, it's turtles all the way down, but some of those abstractions are important, and other abstractions can be taken as "well known" and "details don't matter."

The C libraries that support the CPython implementation, for example, is where the turtles cease to matter (for many people.) Many of us have built a degree of trust and don't need to know how the libraries are implemented or how the hardware works, or what a transistor is, or what electricity is, or why electrons even have a mass or how mass is imparted by the Higgs boson.

A clever UI that removes (or reduces) code makes the abstractions opaque. We can't see past the UI. The software is no longer capturing useful knowledge. Instead, the software is some kind of interpreter, working on a data structure that represents the state of the UI buttons.

Instead of software describing the problem and the problem's state changes, the software is describing a user experience and those state changes.

I need the data structure, the current values as selected by the user, and the software to understand the captured knowledge. 

Perhaps the depiction of the UI will help. 

Perhaps it won't. 

In general, a picture of the UI is useless. It can't answer the question "Why click that?" We can't (and aren't expected) to provide essay answers on a UI. We're expected to click and move on.

If we are forced to provide a essay answers, then the UI could come closer to capturing knowledge. Imagine having a "Reason:" text box next to every clickable button.

We all know what the essay answers will look like. They'll look like bad comments in code. And bad commit comments in Git. And bad documentation.

Some Option: ☑️ Reason: Required
Other Option: ☐ Reason: Not sure if its needed

The problem with fancy UI's and low-code/no-code software is low-information/no-information software. Maintenance becomes difficult, perhaps impossible, because it's difficult understand what's going on.

Tuesday, September 5, 2017

Seven Performance Tips

Packt (@PacktPub)
Want to improve your #Python performance? We've got 7 great tips for you: bit.ly/28YiGeE via @ggzes #CodingTips pic.twitter.com/cGhoGyTSS9

I have one thing to add: Learn to use the profiler and timeit. They will eliminate and hand-wringing over what might be better or worse. The policy is this: Code, Measure, and Choose.