Tuesday, August 9, 2022

Tragedy Averted

I almost made a terrible blunder.

See https://github.com/slott56/py-web-tool for some background. This is a "Literate Programming" tool. I started fooling around with this kind of thing back in '05 (maybe even earlier.) This is not the blunder. The whole idea of literate programming is not very popular. I'm a fan of Jupyter{Book} as the state of the art in sophisticated literate programming, if you're interested in it.

In my case, I started this project so long ago, I used docutils. This was long before Sphinx arrived on the scene. I never updated my little project to use Sphinx. The point was to have a kind of pure literate programming tool that could work with a variety of markup languages, including (but not limited to) RST.

Recently, I learned about PlantUML. The idea of a text description of a diagram is appealing. I don't really need to draw it; I just need to specify what's in it and let graphviz do the rest. This tool is very, very cool. You can capture ideas quickly. You can refine and expand on ideas until you reach a point where code makes more sense than a picture of code. 

For some things, you can gather data and draw a picture of things *as they are*. This is particularly valuable for cloud-based infrastructure where a few queries leads to PlantUML source that is depicted very nicely.

Which leads to the idea of Literate Programming including UML diagrams. 

Doesn't sound too difficult. I can create an extension to docutils to introduce a UML directive. The resulting RST would look like this:

..  uml::

    left to right direction
    skinparam actorStyle awesome

    actor "Developer" as Dev
    rectangle PyWeb {
        usecase "Tangle Source" as UC_Tangle
        usecase "Weave Document" as UC_Weave
    }
    rectangle IDE {
        usecase "Create WEB" as UC_Create
        usecase "Run Tests" as UC_Test
    }
    Dev --> UC_Tangle
    Dev --> UC_Weave
    Dev --> UC_Create
    Dev --> UC_Test

    UC_Test --> UC_Tangle

This could be handy to have the diagrams as part of the documentation that tangles the working the code. One source for all of it. 

I started down the path of researching docutils extensions. Got pretty far. Far enough that I had an empty repository and everything. I was about ready to start creating spike solutions.

Then.

[music cue] *duh duh duuuuuuh*

I found that Sphinx already has an extension for PlantUML. I almost started reading the code to see how it worked.

Then I realized how dumb that was. It already works. Why read the code? Why not install it?

I had a choice to make.

  1. Continue building my own docutils plug-in.
  2. Switch to Sphinx.

Some complications:

  • My Literate Programming tool produces RST that *may* not be compatible with Sphinx.
  • It's yet another dependency in a tool that started out with zero dependencies. I've added pytest and tox. What next? 

What to do?

I have to say that Git is amazing. I can make a branch for the spike. If it works, pull request. If it doesn't work, delete the branch. This continues to be game-changing to me. I'm old. I remember when we had to back up the whole project directory tree before making this kind of change.

It worked. My tool's RST (with one exception) worked perfectly with Sphinx. The one exception was an obscure directive, .. class:: name, used to provide an HTML class name for the following block. This always should have been the docutils .. container:: name directive. With this fix, we're good to go.

I'm happy I avoided the trap of reimplementing something. Instead of that, I upgraded from "bare" docutils with my own CSS to Sphinx with it's sophisticated templates and HTML Themes.

Tuesday, August 2, 2022

Books! Books! Books!

First, there's 

Pivot to Python

A Guide for professionals and skilled beginners

https://books.apple.com/us/book/pivot-to-python/id1586977675 

I've recently updated this to fix some cosmetic problems with title pages, the table of contents and stuff like that. The content hasn't changed. Yet. It's still an introduction to Python for folks who already know how to program, they want to pivot to programming in Python. Quickly.

But wait, there's more. 

Unlearning SQL

When your only tool is a hammer, every problem looks like a nail

https://books.apple.com/us/book/unlearning-sql/id6443164060

This is all new. It's written for folks who know Python, and are struggling with the architectural balance between writing bulk processing in SQL or writing it in Python. For too many developers, SQL is effectively the only tool they can use. With a variety of tools, it becomes easier to solve a wider variety of problems.

Tuesday, July 26, 2022

Bashing the Bash -- The shell is awful and what you can do about it

A presentation I did recently.

https://github.com/slott56/bashing-the-bash

Folks were polite and didn't have too many questions. I guess they fundamentally agreed: the shell is awful, we can use it for a few things.

Safe Shell Scripts Stay Simple: Set the environment, Start the application.

The Seven S's of shell scripting.

Many many thanks to Code & Supply for hosting me.

Tuesday, July 19, 2022

I've got a great Proof-of-Concept. How do I go forward with it?

This is the best part about Python -- you can build something quickly. And it really works.

But. 

What are the next steps?

While there are a *lot* of possibilities, I'm focused on an "enterprise work group" application that involves a clever web service/RESTful API built in Flask. Maybe with NLP.

Let me catalog a bunch of things you might want to think about to "productionize" your great idea. Here's a short list to get started.

  • File System Organization
  • Virtual Environments
  • Unit Testing
  • Integration Testing
  • Acceptance Testing
  • Static Analysis
  • Tool Chain
  • Documentation
Let's dive into each one of these. Then we'll look at Flask deployments.

File System Organization

When you're gotten something to work, the directory in which it works is sometimes not organized ideally. There are a lot of ways to do this, but what seems to work well is a structure like the following.

- Some parent directory. Often in Git
  - src -- your code is here
  - tests -- your tests are here
  - docs -- your documentation will be here
  requirements.txt -- the list of packages to install. Exact, pinned version numbers
  requirements-dev.txt -- the list of packages used for maintenance and development
  environment.yml -- another list of packages in conda format
  pyproject.toml -- this has your tox setup in it
  Makefile -- sometimes helpful

Note that a lot of packages you see have a setup.py.  This is **only** needed if you're going open source your code. For enterprise projects, this is not the first thing you will focus on. Ignore it, for now.

Virtual Environments

When you're developing in Python you may not even worry about virtual environments. You have Python. It works. You downloaded NLP and Flask. You put things together and they work.

The trick here is the Python ecosystem is vast, and you have (without really observing it closely) likely downloaded a lot of projects. Projects that depend on projects. 

You can't trust your current environment to be reliable or repeatable. You'll need to use a virtual environment manager of some kind.

Python's built-in virtual environment manager venv is readily available and works nicely. See https://docs.python.org/3/library/venv.html  It's my second choice. 

My first choice is conda. Start with minicondahttps://docs.conda.io/en/latest/miniconda.html. Use this to assemble your environment and retest your application to be sure you've got everything.

You'll be creating (and destroying) virtual environments until you get it right. They're cheap. They don't impact your code in any way. Feel free to make mistakes.

When it works, build conda's environment.yml file and the requirements.txt files. This will rebuild the environment.  You'll use them with tox for testing.

If you don't use conda, you'll omit the environment.yml.  Nothing else will change.

Unit Testing

Of course, you'll need automated unit tests. You'll want 100% code coverage. You *really* want 100% logic path coverage, but that's aspirational. 100% code coverage is a lot of work and uncovers enough problems that the extra testing for all logic paths seems unhelpful.

You have two built-in unit testing toolsets: doctest and unittest. I like doctest. https://docs.python.org/3/library/doctest.html

You'll want to get pytest and the pytest-cov add-on package. https://docs.pytest.org/en/6.2.x/contents.html  https://pytest-cov.readthedocs.io/en/latest/.  

Your test modules go in the tests directory. You know you've done it right when you can use the pytest command at the command line and it finds (and runs) all your tests. 

This is part of your requirements-dev.txt file.

Integration Testing

This is unit testing without so many mocks. I recommend using pytest for this, also. The difference is that your "fixtures" will be much more complex. Files. Databases. Flask Clients. Certificates. Maybe starting multiple services. All kinds of things that have a complex setup and perhaps a complex teardown, also.

See https://docs.pytest.org/en/6.2.x/fixture.html#yield-fixtures-recommended for good ways to handle this more complex setup and teardown.

Acceptance Testing

Depending on the community of users, it may be necessary to provide automated acceptance tests. For this, I recommend behave. https://behave.readthedocs.io/en/stable/ You're can write the test cases in the Gherkin language. This language is open-ended, and many stakeholders can contribute to the test cases. It's not easy to get consensus sometimes, and a more formal Gherkin test case lets people debate, come to an agreement, and prioritize the features and scenarios they need to see.

This is part of your requirements-dev.txt file.

Static Analysis

This is an extra layer of checking to be sure best practices are being followed. There are a variety of tools for this. You *always* want to process your code through blackhttps://black.readthedocs.io/en/stable/ 

Some folks love isort for putting the imports into a canonical order.  https://pycqa.github.io/isort/

Flake8 should be used to be sure there's no obviously bad programming practices. https://flake8.pycqa.org/en/latest/

I'm a huge fan of type hints. I consider mypy to be essential. https://mypy.readthedocs.io/en/stable/  I prefer "--strict" mode, but that can be a high bar. 

Tool Chain

You can try to manage this with make. But don't.

Download tox, instead.  https://tox.wiki/en/latest/index.html  

The point of tox is to combine virtual environment setup with testing in that virtual environment. You can -- without too much pain -- define multiple virtual environments. You can then test the various releases of the various packages your project depends on in various combinations. This is how to manage a clean upgrade. 

1. Figure out the new versions.

2. Setup tox to test existing and new.

3. Run tox.

I often set the tox commands to run black first, then unit testing, then static analysis, ending with mypy --strict.

When the code is reformatted by black, it's technically a build failure. (You should have run black manually before running tox.) When tox works cleanly, you're ready to commit and push and pull request and merge.

Documentation

Not an after-thought.

For human documents, use Sphinx. https://www.sphinx-doc.org/en/master/ 

Put docstrings in every package, every module, every class, every method, and every function. Summarize *what* and *why*. (Don't explain *how*: people can read your code.) 

Use the autodoc feature to create the API reference documentation from the code. Start with this.

Later, you can write a README, and some explanations, and installation instructions, and all the things other people expect to see.

For a RESTful API, be sure to write an OpenAPI specification and be sure to test against that spec. https://www.openapis.org. While a lot of the examples are complicated, you can easily use a small subset to describe your documents, the validation rules, and the transactions. You can add the security details later. They're part of your web server, but they don't need an extensive OpenAPI documentation at the beginning.

Flask Deployments

Some folks like to define a flask application that can be installed in the Python virtual environment. This means the components are on the default sys.path without any "extra" effort. (It's a fair amount of effort to begin with. I'm not sure it's worth it.)

When you run a flask app, you'll be using some kind of engine. NGINX, uWSGI, GUnicorn, etc. (GUnicorn is very nice. https://gunicorn.org). 

See https://flask.palletsprojects.com/en/2.0.x/deploying/wsgi-standalone/.

In all cases, these engines will "wrap" your Flask application. You'll want to make your application visible by setting the PYTHONPATH environment variable, naming your src directory. Do not run from your project's directory.

You will have the engine running in some distinct /opt/the_app or /Users/the_app or /usr/home/the_app or some such directory, unrelated to where the code lives. You'll use GUnicorns command-line options to locate your app, wherever it lives on the filesystem. GUnicorn will use PYTHONPATH to find your app. Since web servers often run as nobody, you'll need to make sure your code base is readable. But. Not. Writable.

Tuesday, July 12, 2022

The Enterprise COBOL Conundrum

Enterprise COBOL is both a liability and an asset. There's tangible value hidden in the code.

See https://github.com/slott56/looking-at-cobol  

I've tweaked the presentation a little. 

The essential ingredients in coping with COBOL are these:

  • Use something like Stingray Reader to parse COBOL DDE's and process the data in the native format.
  • Analyze the Job Control Language (JCL) to work out the directed acyclic graph (DAG) that leads to file and database updates. These "master" files and databases are the data artifacts that matter most. This is the value-creating processing. There aren't many of these files. 
  • Create a process to clone those files, and write Python data access modules to process the data. This is a two-way process. You'll be shipping files from your Z/OS world to another server running Python. In some cases, files will need to come back to Z/OS to permit legacy processing to continue. 
  • Work backwards through the DAG to understand the COBOL apps that update the master files. These can be rewritten as Python apps that consume transactions and update master files/databases. Transfer transaction files out of Z/OS to a server doing the Python processing. Either update a shared database or send updated master files back to Z/OS if there's further processing that needs an updated master. 
  • Continue working backwards through the DAG, replacing COBOL with Python until you've found source files for the transactions. Expect to find transaction validation programs as well as transaction analytics or reporting. The validations are useful; the analytics and reporting can be replaced with simpler, more modern tools.
  • When there's no more legacy processing that depends on a given master file or database, then the Z/OS can be formally decommissioned. Have a party.

This is relatively low risk work. It's high value. The COBOL code encodes enterprise knowledge. Preserving this knowledge in a more modern language is a value-maintaining exercise. Indeed, the improved clarity may be a value-creating exercise.

Tuesday, July 5, 2022

Revised Understanding --> Revised Data Structures --> Revised Type Hints

My literate programming tool, pyWeb, has moved to version 3.1 -- supporting modern Python.

Next up, version 3.2. This is a massive reworking of the data structures involved. The rework lets me use Jinja2 for templates. There's a lot of fiddliness to getting the end-of-line spacing right. Jinja has the following:

{% for construct in container -%}
{{construct}}
{%- endfor %} 

The easy-to-overlook hyphens suppress spacing, allowing the construct to be spread onto multiple lines without introducing extra newlines into the output. This makes it a little easier to debug the templates.

It now works. But. Until I get past strict type checks, there's no reason for calling it done.

Found 94 errors in 1 file (checked 3 source files)

The bulk of the remaining problems seem to be new methods where I forgot to include a type hint. The more pernicious problems are places where I have inconsistent hints and Liskov substitution problems. The worst a places where I had a last-minute change change and switched from str to int and did not actually follow-through and make required changes.

The biggest issue?

When building an AST, it's common to have a union of a wide variety of types. This union often has a discriminator value to separate NamedChunk from OutputChunk. This is "type narrowing" and there are a variety of approaches. I think my best choice is a TypeGuard declaration. This is new to me, so I've got to do some learning before I can properly define the required type guard function(s). (See https://mypy.readthedocs.io/en/stable/type_narrowing.html#user-defined-type-guards)

I'm looking forward (eagerly) to finishing the cleanup. 

The problem is that I'm -- also -- working on the updates to Functional Python Programming. The PyWeb project is a way to relax my brain from editing the book. 

Which means the pyWeb updates have to wait for Chapter 4 and 5 edits. (Sigh.)

Tuesday, June 28, 2022

Massive Rework of Data Structures

As noted in My Shifting Understanding and A Terrible Design Mistake, I had a design that focused on serialization instead of proper modeling of the objects in question.

Specifically, I didn't start with a suitable abstract syntax tree (AST) structure. I started with an algorithmic view of "weaving" and "tangling" to transform a WEB of definitions into documentation and code. The weaving and tangling are two of the three distinct serializations of a common AST. 

The third serialization is the common source format that underpins the WEB of definitions. Here's an example that contains a number of definitions and a tangled output file.

Fast Exponentiation
===================

A classic divide-and-conquer algorithm.

@d fast exp @{
def fast_exp(n: int, p: int) -> int:
    match p:
        case 0: 
            return 1
        case _ if p % 2 == 0:
            t = fast_exp(n, p // 2)
            return t * t
        case _ if p % 1 == 0:
            return n * fast_exp(n, p - 1)
@| fast_exp
@}

With a test case.

@d test case @{
>>> fast_exp(2, 30)
1073741824
@}

@o example.py @{
@< fast exp @>

__test__ = {
    "test 1": '''
@< test case @>
    '''
}
@| __test__
@}

Use ``python -m doctest`` to test.

Macros
------

@m

Names
-----

@u

This example uses RST as the markup language for the woven document. A tool can turn this simplified document into complete RST with appropriate wrappers around the code blocks. The tool can also weave the example.py file from the source document.

The author can focus on exposition, explaining the algorithm. The reader gets the key points without the clutter of programming language overheads and complications.

The compiler gets a tangled source.

The key point is to have a tool that's (mostly) agnostic with respect to programming language and markup language. Being fully agnostic isn't possible, of course. The @d name @{code@} constructs are transformed into markup blocks of some sophistication. The @<name@> becomes a hyperlink, with suitable markup. Similarly, the cross reference-generating commands, @m and @u, generate a fair amount of markup content. 

I now have Jinja templates to do this in RST. I'll also have to provide LaTeX and HTML. Further, I need to provide generic LaTeX along with LaTeX I can use with PacktPub's LaTeX publishing pipeline. But let's not look too far down the road. First things first.

TL;DR

Here's today's progress measurement.

==================== 67 failed, 13 passed, 1 error in 1.53s ====================

This comforts me a great deal. Some elements of the original structure still work. There are two kinds of failures: new test fixtures that require TestCase.setUp() methods, and tests for features that are no longer part of the design.

In order to get the refactoring to a place where it would even run, I had to incorporate some legacy methods that -- it appears -- will eventually become dead code. It's not totally dead, yet, because I'm still mid-way through the refactoring. 

But. I'm no longer beating back and forth trying to see if I've got a better design. I'm now on the downwind broad reach of finding and fixing the 67 test cases that are broken.