Tuesday, August 9, 2022

Tragedy Averted

I almost made a terrible blunder.

See https://github.com/slott56/py-web-tool for some background. This is a "Literate Programming" tool. I started fooling around with this kind of thing back in '05 (maybe even earlier.) This is not the blunder. The whole idea of literate programming is not very popular. I'm a fan of Jupyter{Book} as the state of the art in sophisticated literate programming, if you're interested in it.

In my case, I started this project so long ago, I used docutils. This was long before Sphinx arrived on the scene. I never updated my little project to use Sphinx. The point was to have a kind of pure literate programming tool that could work with a variety of markup languages, including (but not limited to) RST.

Recently, I learned about PlantUML. The idea of a text description of a diagram is appealing. I don't really need to draw it; I just need to specify what's in it and let graphviz do the rest. This tool is very, very cool. You can capture ideas quickly. You can refine and expand on ideas until you reach a point where code makes more sense than a picture of code. 

For some things, you can gather data and draw a picture of things *as they are*. This is particularly valuable for cloud-based infrastructure where a few queries leads to PlantUML source that is depicted very nicely.

Which leads to the idea of Literate Programming including UML diagrams. 

Doesn't sound too difficult. I can create an extension to docutils to introduce a UML directive. The resulting RST would look like this:

..  uml::

    left to right direction
    skinparam actorStyle awesome

    actor "Developer" as Dev
    rectangle PyWeb {
        usecase "Tangle Source" as UC_Tangle
        usecase "Weave Document" as UC_Weave
    }
    rectangle IDE {
        usecase "Create WEB" as UC_Create
        usecase "Run Tests" as UC_Test
    }
    Dev --> UC_Tangle
    Dev --> UC_Weave
    Dev --> UC_Create
    Dev --> UC_Test

    UC_Test --> UC_Tangle

This could be handy to have the diagrams as part of the documentation that tangles the working the code. One source for all of it. 

I started down the path of researching docutils extensions. Got pretty far. Far enough that I had an empty repository and everything. I was about ready to start creating spike solutions.

Then.

[music cue] *duh duh duuuuuuh*

I found that Sphinx already has an extension for PlantUML. I almost started reading the code to see how it worked.

Then I realized how dumb that was. It already works. Why read the code? Why not install it?

I had a choice to make.

  1. Continue building my own docutils plug-in.
  2. Switch to Sphinx.

Some complications:

  • My Literate Programming tool produces RST that *may* not be compatible with Sphinx.
  • It's yet another dependency in a tool that started out with zero dependencies. I've added pytest and tox. What next? 

What to do?

I have to say that Git is amazing. I can make a branch for the spike. If it works, pull request. If it doesn't work, delete the branch. This continues to be game-changing to me. I'm old. I remember when we had to back up the whole project directory tree before making this kind of change.

It worked. My tool's RST (with one exception) worked perfectly with Sphinx. The one exception was an obscure directive, .. class:: name, used to provide an HTML class name for the following block. This always should have been the docutils .. container:: name directive. With this fix, we're good to go.

I'm happy I avoided the trap of reimplementing something. Instead of that, I upgraded from "bare" docutils with my own CSS to Sphinx with it's sophisticated templates and HTML Themes.

Tuesday, August 2, 2022

Books! Books! Books!

First, there's 

Pivot to Python

A Guide for professionals and skilled beginners

https://books.apple.com/us/book/pivot-to-python/id1586977675 

I've recently updated this to fix some cosmetic problems with title pages, the table of contents and stuff like that. The content hasn't changed. Yet. It's still an introduction to Python for folks who already know how to program, they want to pivot to programming in Python. Quickly.

But wait, there's more. 

Unlearning SQL

When your only tool is a hammer, every problem looks like a nail

https://books.apple.com/us/book/unlearning-sql/id6443164060

This is all new. It's written for folks who know Python, and are struggling with the architectural balance between writing bulk processing in SQL or writing it in Python. For too many developers, SQL is effectively the only tool they can use. With a variety of tools, it becomes easier to solve a wider variety of problems.

Tuesday, July 26, 2022

Bashing the Bash -- The shell is awful and what you can do about it

A presentation I did recently.

https://github.com/slott56/bashing-the-bash

Folks were polite and didn't have too many questions. I guess they fundamentally agreed: the shell is awful, we can use it for a few things.

Safe Shell Scripts Stay Simple: Set the environment, Start the application.

The Seven S's of shell scripting.

Many many thanks to Code & Supply for hosting me.

Tuesday, July 19, 2022

I've got a great Proof-of-Concept. How do I go forward with it?

This is the best part about Python -- you can build something quickly. And it really works.

But. 

What are the next steps?

While there are a *lot* of possibilities, I'm focused on an "enterprise work group" application that involves a clever web service/RESTful API built in Flask. Maybe with NLP.

Let me catalog a bunch of things you might want to think about to "productionize" your great idea. Here's a short list to get started.

  • File System Organization
  • Virtual Environments
  • Unit Testing
  • Integration Testing
  • Acceptance Testing
  • Static Analysis
  • Tool Chain
  • Documentation
Let's dive into each one of these. Then we'll look at Flask deployments.

File System Organization

When you're gotten something to work, the directory in which it works is sometimes not organized ideally. There are a lot of ways to do this, but what seems to work well is a structure like the following.

- Some parent directory. Often in Git
  - src -- your code is here
  - tests -- your tests are here
  - docs -- your documentation will be here
  requirements.txt -- the list of packages to install. Exact, pinned version numbers
  requirements-dev.txt -- the list of packages used for maintenance and development
  environment.yml -- another list of packages in conda format
  pyproject.toml -- this has your tox setup in it
  Makefile -- sometimes helpful

Note that a lot of packages you see have a setup.py.  This is **only** needed if you're going open source your code. For enterprise projects, this is not the first thing you will focus on. Ignore it, for now.

Virtual Environments

When you're developing in Python you may not even worry about virtual environments. You have Python. It works. You downloaded NLP and Flask. You put things together and they work.

The trick here is the Python ecosystem is vast, and you have (without really observing it closely) likely downloaded a lot of projects. Projects that depend on projects. 

You can't trust your current environment to be reliable or repeatable. You'll need to use a virtual environment manager of some kind.

Python's built-in virtual environment manager venv is readily available and works nicely. See https://docs.python.org/3/library/venv.html  It's my second choice. 

My first choice is conda. Start with minicondahttps://docs.conda.io/en/latest/miniconda.html. Use this to assemble your environment and retest your application to be sure you've got everything.

You'll be creating (and destroying) virtual environments until you get it right. They're cheap. They don't impact your code in any way. Feel free to make mistakes.

When it works, build conda's environment.yml file and the requirements.txt files. This will rebuild the environment.  You'll use them with tox for testing.

If you don't use conda, you'll omit the environment.yml.  Nothing else will change.

Unit Testing

Of course, you'll need automated unit tests. You'll want 100% code coverage. You *really* want 100% logic path coverage, but that's aspirational. 100% code coverage is a lot of work and uncovers enough problems that the extra testing for all logic paths seems unhelpful.

You have two built-in unit testing toolsets: doctest and unittest. I like doctest. https://docs.python.org/3/library/doctest.html

You'll want to get pytest and the pytest-cov add-on package. https://docs.pytest.org/en/6.2.x/contents.html  https://pytest-cov.readthedocs.io/en/latest/.  

Your test modules go in the tests directory. You know you've done it right when you can use the pytest command at the command line and it finds (and runs) all your tests. 

This is part of your requirements-dev.txt file.

Integration Testing

This is unit testing without so many mocks. I recommend using pytest for this, also. The difference is that your "fixtures" will be much more complex. Files. Databases. Flask Clients. Certificates. Maybe starting multiple services. All kinds of things that have a complex setup and perhaps a complex teardown, also.

See https://docs.pytest.org/en/6.2.x/fixture.html#yield-fixtures-recommended for good ways to handle this more complex setup and teardown.

Acceptance Testing

Depending on the community of users, it may be necessary to provide automated acceptance tests. For this, I recommend behave. https://behave.readthedocs.io/en/stable/ You're can write the test cases in the Gherkin language. This language is open-ended, and many stakeholders can contribute to the test cases. It's not easy to get consensus sometimes, and a more formal Gherkin test case lets people debate, come to an agreement, and prioritize the features and scenarios they need to see.

This is part of your requirements-dev.txt file.

Static Analysis

This is an extra layer of checking to be sure best practices are being followed. There are a variety of tools for this. You *always* want to process your code through blackhttps://black.readthedocs.io/en/stable/ 

Some folks love isort for putting the imports into a canonical order.  https://pycqa.github.io/isort/

Flake8 should be used to be sure there's no obviously bad programming practices. https://flake8.pycqa.org/en/latest/

I'm a huge fan of type hints. I consider mypy to be essential. https://mypy.readthedocs.io/en/stable/  I prefer "--strict" mode, but that can be a high bar. 

Tool Chain

You can try to manage this with make. But don't.

Download tox, instead.  https://tox.wiki/en/latest/index.html  

The point of tox is to combine virtual environment setup with testing in that virtual environment. You can -- without too much pain -- define multiple virtual environments. You can then test the various releases of the various packages your project depends on in various combinations. This is how to manage a clean upgrade. 

1. Figure out the new versions.

2. Setup tox to test existing and new.

3. Run tox.

I often set the tox commands to run black first, then unit testing, then static analysis, ending with mypy --strict.

When the code is reformatted by black, it's technically a build failure. (You should have run black manually before running tox.) When tox works cleanly, you're ready to commit and push and pull request and merge.

Documentation

Not an after-thought.

For human documents, use Sphinx. https://www.sphinx-doc.org/en/master/ 

Put docstrings in every package, every module, every class, every method, and every function. Summarize *what* and *why*. (Don't explain *how*: people can read your code.) 

Use the autodoc feature to create the API reference documentation from the code. Start with this.

Later, you can write a README, and some explanations, and installation instructions, and all the things other people expect to see.

For a RESTful API, be sure to write an OpenAPI specification and be sure to test against that spec. https://www.openapis.org. While a lot of the examples are complicated, you can easily use a small subset to describe your documents, the validation rules, and the transactions. You can add the security details later. They're part of your web server, but they don't need an extensive OpenAPI documentation at the beginning.

Flask Deployments

Some folks like to define a flask application that can be installed in the Python virtual environment. This means the components are on the default sys.path without any "extra" effort. (It's a fair amount of effort to begin with. I'm not sure it's worth it.)

When you run a flask app, you'll be using some kind of engine. NGINX, uWSGI, GUnicorn, etc. (GUnicorn is very nice. https://gunicorn.org). 

See https://flask.palletsprojects.com/en/2.0.x/deploying/wsgi-standalone/.

In all cases, these engines will "wrap" your Flask application. You'll want to make your application visible by setting the PYTHONPATH environment variable, naming your src directory. Do not run from your project's directory.

You will have the engine running in some distinct /opt/the_app or /Users/the_app or /usr/home/the_app or some such directory, unrelated to where the code lives. You'll use GUnicorns command-line options to locate your app, wherever it lives on the filesystem. GUnicorn will use PYTHONPATH to find your app. Since web servers often run as nobody, you'll need to make sure your code base is readable. But. Not. Writable.

Tuesday, July 12, 2022

The Enterprise COBOL Conundrum

Enterprise COBOL is both a liability and an asset. There's tangible value hidden in the code.

See https://github.com/slott56/looking-at-cobol  

I've tweaked the presentation a little. 

The essential ingredients in coping with COBOL are these:

  • Use something like Stingray Reader to parse COBOL DDE's and process the data in the native format.
  • Analyze the Job Control Language (JCL) to work out the directed acyclic graph (DAG) that leads to file and database updates. These "master" files and databases are the data artifacts that matter most. This is the value-creating processing. There aren't many of these files. 
  • Create a process to clone those files, and write Python data access modules to process the data. This is a two-way process. You'll be shipping files from your Z/OS world to another server running Python. In some cases, files will need to come back to Z/OS to permit legacy processing to continue. 
  • Work backwards through the DAG to understand the COBOL apps that update the master files. These can be rewritten as Python apps that consume transactions and update master files/databases. Transfer transaction files out of Z/OS to a server doing the Python processing. Either update a shared database or send updated master files back to Z/OS if there's further processing that needs an updated master. 
  • Continue working backwards through the DAG, replacing COBOL with Python until you've found source files for the transactions. Expect to find transaction validation programs as well as transaction analytics or reporting. The validations are useful; the analytics and reporting can be replaced with simpler, more modern tools.
  • When there's no more legacy processing that depends on a given master file or database, then the Z/OS can be formally decommissioned. Have a party.

This is relatively low risk work. It's high value. The COBOL code encodes enterprise knowledge. Preserving this knowledge in a more modern language is a value-maintaining exercise. Indeed, the improved clarity may be a value-creating exercise.

Tuesday, July 5, 2022

Revised Understanding --> Revised Data Structures --> Revised Type Hints

My literate programming tool, pyWeb, has moved to version 3.1 -- supporting modern Python.

Next up, version 3.2. This is a massive reworking of the data structures involved. The rework lets me use Jinja2 for templates. There's a lot of fiddliness to getting the end-of-line spacing right. Jinja has the following:

{% for construct in container -%}
{{construct}}
{%- endfor %} 

The easy-to-overlook hyphens suppress spacing, allowing the construct to be spread onto multiple lines without introducing extra newlines into the output. This makes it a little easier to debug the templates.

It now works. But. Until I get past strict type checks, there's no reason for calling it done.

Found 94 errors in 1 file (checked 3 source files)

The bulk of the remaining problems seem to be new methods where I forgot to include a type hint. The more pernicious problems are places where I have inconsistent hints and Liskov substitution problems. The worst a places where I had a last-minute change change and switched from str to int and did not actually follow-through and make required changes.

The biggest issue?

When building an AST, it's common to have a union of a wide variety of types. This union often has a discriminator value to separate NamedChunk from OutputChunk. This is "type narrowing" and there are a variety of approaches. I think my best choice is a TypeGuard declaration. This is new to me, so I've got to do some learning before I can properly define the required type guard function(s). (See https://mypy.readthedocs.io/en/stable/type_narrowing.html#user-defined-type-guards)

I'm looking forward (eagerly) to finishing the cleanup. 

The problem is that I'm -- also -- working on the updates to Functional Python Programming. The PyWeb project is a way to relax my brain from editing the book. 

Which means the pyWeb updates have to wait for Chapter 4 and 5 edits. (Sigh.)

Tuesday, June 28, 2022

Massive Rework of Data Structures

As noted in My Shifting Understanding and A Terrible Design Mistake, I had a design that focused on serialization instead of proper modeling of the objects in question.

Specifically, I didn't start with a suitable abstract syntax tree (AST) structure. I started with an algorithmic view of "weaving" and "tangling" to transform a WEB of definitions into documentation and code. The weaving and tangling are two of the three distinct serializations of a common AST. 

The third serialization is the common source format that underpins the WEB of definitions. Here's an example that contains a number of definitions and a tangled output file.

Fast Exponentiation
===================

A classic divide-and-conquer algorithm.

@d fast exp @{
def fast_exp(n: int, p: int) -> int:
    match p:
        case 0: 
            return 1
        case _ if p % 2 == 0:
            t = fast_exp(n, p // 2)
            return t * t
        case _ if p % 1 == 0:
            return n * fast_exp(n, p - 1)
@| fast_exp
@}

With a test case.

@d test case @{
>>> fast_exp(2, 30)
1073741824
@}

@o example.py @{
@< fast exp @>

__test__ = {
    "test 1": '''
@< test case @>
    '''
}
@| __test__
@}

Use ``python -m doctest`` to test.

Macros
------

@m

Names
-----

@u

This example uses RST as the markup language for the woven document. A tool can turn this simplified document into complete RST with appropriate wrappers around the code blocks. The tool can also weave the example.py file from the source document.

The author can focus on exposition, explaining the algorithm. The reader gets the key points without the clutter of programming language overheads and complications.

The compiler gets a tangled source.

The key point is to have a tool that's (mostly) agnostic with respect to programming language and markup language. Being fully agnostic isn't possible, of course. The @d name @{code@} constructs are transformed into markup blocks of some sophistication. The @<name@> becomes a hyperlink, with suitable markup. Similarly, the cross reference-generating commands, @m and @u, generate a fair amount of markup content. 

I now have Jinja templates to do this in RST. I'll also have to provide LaTeX and HTML. Further, I need to provide generic LaTeX along with LaTeX I can use with PacktPub's LaTeX publishing pipeline. But let's not look too far down the road. First things first.

TL;DR

Here's today's progress measurement.

==================== 67 failed, 13 passed, 1 error in 1.53s ====================

This comforts me a great deal. Some elements of the original structure still work. There are two kinds of failures: new test fixtures that require TestCase.setUp() methods, and tests for features that are no longer part of the design.

In order to get the refactoring to a place where it would even run, I had to incorporate some legacy methods that -- it appears -- will eventually become dead code. It's not totally dead, yet, because I'm still mid-way through the refactoring. 

But. I'm no longer beating back and forth trying to see if I've got a better design. I'm now on the downwind broad reach of finding and fixing the 67 test cases that are broken. 

Tuesday, June 21, 2022

My Shifting Understanding and A Terrible Design Mistake

I've been fascinated by Literate Programming forever. 

I have two utterly divergent takes on this.

See https://github.com/slott56/PyLit-3 for one.

See https://github.com/slott56/py-web-tool for another.

And yet, I've still done a really bad design job. Before we get to the design, a little bit of back story.

Back Story

Why two separate literate programming projects? Because it's not clear what's best. It's a field without too many boundaries and a lot of questions about the value produced.

PyLit I found, forked, and upgraded to Python 3. I didn't design it. It's far more clever than something I'd design.

Py-Web-Tool is something I wrote based on using a whole bunch of tools that follow along behind the original WEB tools. Nothing to do with web servers or web.py.

The Problem Domain

The design problem is, in retrospect, pretty obvious. I set it out here as a cautionary tale.

I'm looking at the markup languages for doing literate programming. The idea is to have named blocks of code in your document, presented in an order that makes sense to your reader. A tool will "weave" a document from your source. It will also "tangle" source code by rearranging the code snippets from presentation order into compiler-friendly order.

This means you can present your core algorithm first, even though it's buried in the middle of some module in the middle of your package. 

The presentation order is *not* tied to the order needed by your language's toolchain.

For languages like C this is huge freedom. For Python, it's not such a gigantic win.

The source material is a "web" of code and information about the code. A web file may look like this:

Important insight.

@d core feature you need to know about first @{
    def somecode() -> None:
        pass
@}

And see how this fits into a larger context?

@d something more expansive @{
def this() -> None:
    pass
    
def that() -> None:
    pass
    
@<core feature you need to know about first@>
@}

See how that works?

This is easy to write and (relatively) easy to read. The @<core feature you need to know about first@> becomes a hyperlink in the published documentation. So you can flip between the sections. It's physically expanded inline to tangle the code, but you don't often need to look at the tangled code.

The Design Question

The essential Literate Programming tool is a compiler with two outputs:

  • The "woven" document with markup and such
  • The "tangled" code files which are code, largely untouched, but reordered.

We've got four related problems.

  1. Parsing the input
  2. An AST we can process
  3. Emitting tangled output from the AST
  4. Emitting woven output form the AST

Or, we can look at it as three classic problems: deserialization, AST representation, and serialization. Additionally, we have two distinct serialization alternatives.

What did I do?

I tackled serialization first. Came up with a cool bunch of classes and methods to serialize the two kinds of documents.

Then I wrote the deserialization (or parsing) of the source WEB file. This is pretty easy, since the markup is designed to be as trivial as possible. 

The representation is little more than glue between the two.

What a mistake.

A Wrong Answer

Focusing on serialization was an epic mistake.

I want to try using Jinja2 for the markup templates instead of string.Template

However. 

My AST was such a bad hack job it was essentially impossible to use it. It was a quagmire of inconsistent ad-hoc methods to solve a specific serialization issue.

As I start down the Jinja road, I found a need to be able to build an AST without the overhead of parsing.

Which caused me to realize that the AST was -- while structurally sensible -- far from the simple ideal.

What's the ideal?

The Right Answer

This ideal AST is something that lets me build test fixtures like this:

example = Web(
   chunks=[
       TextChunk("\n"),
       NamedCodeChunk(name="core feature you need to know about first", lines=["def someconme() -> None: ...", "pass"])),
       TextChunk("\nAnd see how this fits into a larger context?\n"),
       NamedCodeChunk(name="something more expansive", lines=[etc. etc.])
   ]
)

Here's my test for usability: I can build the AST "manually" without a parser. 

The parser can build one, also, but I can build it as a sensible, readable, first-class Python object.

This has pointed me to a better design for the overall constructs of the WEB source document. Bonus. It's helping me define Jinja templates that can render this as a sensible woven document.

Tangling does not need Jinja. It's simpler. And -- by convention -- the tangled code does not have anything injected into it. The woven code is in a markup language (Markdown, RST, HTML, LaTeX, ASCII DOC, whatever) and some markup is required to create hyperlinks and code sections. Jinja is super helpful here. 

TL;DR

The essence of the problem is rarely serialization or deserialization.  It's the internal representation.


Tuesday, June 14, 2022

A LaTeX Thing I Did -- And A ToDo:

When writing about code in LaTeX, the essential strategy is to use an environment to format the code so it stands out from surrounding text. There are a few of these environments available as LaTeX add-on packages. The three popular ones are:

These are nice for making code readable and distinct from the surrounding text.

A common way to talk about the code is to use inline verbatim \verb|code| sections. I prefer inline \lstinline|code|, but, my editor prefers \verb. (I have trouble getting all the moving parts of minted installed properly, so I use listings.)

Also. And more important. 

There's the \lstinputlisting[language=Python, firstline=2, lastline=12]{some_module.py} command. This lets an author incorporate examples from working, tested modules. Minted doesn't seem to have this, but it might work with an \input command. Don't know. Haven't tried.

Let's talk about workflow.

Workflow

The idea behind these tools is you have code and after that, you write about the code. I call this code first.

Doing this means you can include code snippets from a file.

Which is okay, but, there's another point of view: you have a document that contains the code. This is closer to the Literate Programming POV. I call this document first. I've got all the code in the document you're reading, I've just broken it up and spread it around in an order to serve my purpose as a writer, not serve the limitations of a parser or compiler.

There is a development environment -- WEB -- to create code that can be run through the Weave and Tangle tools to create working code and usable documentation. This is appealing in many ways. 

For now, I'm settling for the following workflow:

  1. Write the document with code samples. Use \lstlisting environment with explicit unique labels for each snippet. The idea is to focus on the documentation with explanations.
  2. Write a Jinja template that references the code samples. This is a lot of {{extract['lst:listing_1']}} kind of references. There's a bit more that can go in here, we'll return to the templates in a moment.
  3. Run a tool to extract all the \lstlisting environments to a dictionary with the label as the key and the block of text as the value. This serializes nicely as a JSON (or TOML or YAML) file. It can even be pickled, but I prefer to be able to look at the file to see what's in it.
  4. The tool to populate the template is a kind of trivial thing to build a Jinja environment, load up the template, fill in the code samples, and write the result.
  5. I can then use tox (and doctest and pytest and mypy) to test the resulting module to be sure it works.

This tangles code from a source document. There's no weave step, since the source is already designed for publication. This does require me to make changes to the LaTeX document I'm writing and run a make test command to extract, tangle, and test. This is not a huge burden. Indeed, it's easy to implement in PyCharm, because the latest release of PyCharm understands Makefiles and tox. Since each chapter is a distinct environment, I can use tox -e ch01 to limit the testing to only the chapter I'm working on.

I like this because it lets me focus on explanation, not implementation details. It helps me make sure that all the code in the book is fully tested. 

The Templates

The template files for an example module have these three kinds of code blocks:

  1. Ordinary Listings. These fall into two subclasses.
    1. Complete function or class definitions.
    2. Lines of code taken out of context.
  2. REPL Examples. 

These have three different testing requirements. We'll start with the "complete function or class definitions."  For these, the template might look like the following

{{extract['lst:listing_1']}}

def test_listing_1() -> None:
    assert listing_1(42)
    assert not listing_1(None)

This has both the reference to the code in the text of the book and a test case for the code.

For lines of code out of context, we have to be more careful. We might have this.

def some_example(arg: int) -> bool:
    {{extract['lst:listing_2']}}

def test_listing_2() -> None:
    assert listing_2(42)
    assert not listing_2(None)

This is similar to a complete definition, but it has a fiddly indentation that needs to be properly managed, also. Jinja's generally good about not inserting spaces. The template, however, is full of what could appear to be syntax errors, so the code editor could have a conniption with all those {} blocks of code. They happen to be valid Python set literals, so, they're tolerated. PyCharm's type checking hates them.

The REPL examples, look like this.

REPL_listing_3 = """
{{extract['lst:listing_3']}}
"""

I collect these into a __test__ variable to make them easy for doctest to find. The extra fussiness of  a __test__ variable isn't needed, but it provides a handy audit for me to make sure everything has a home.

The following line of code is in most (not all) templates.

__test__ = {
    name: value
    for name, value in globals().items() 
    if name.startswith("REPL")
}

This will locate all of the global variables with names starting with REPL and put them in the __test__ mapping. The REPL names then become the test case names, making any test failures easier to spot.

My Goal

I do have some Literate Programming tools that I might be able to leverage to make myself a Weaver that produces useful LaTeX my publisher can work with. I should do this because it would be slightly simpler. The problem is my Web/Weave/Tangle tooling has a bunch of dumb assumptions about the weave and tangle outputs; a problem I really need to fix.

See py-web-tool.

The idea here is to mimic other WEB-based tooling. These are the two primary applications:

  • Weave. This makes documentation in a fairly transparent way from the source. There are a bunch of substitutions required to fill in HTML or LaTeX or Markdown or RST around the generic source. Right now, this is pretty inept and almost impossible to configure.
  • Tangle. This makes code from the source. The point here is the final source file is not necessarily built in any obvious order. It's a tangle of things from the documentation, put into the order required by parser or compiler or build system or whatever.

The weaving requires a better way to provide the various templates that fill in missing bits. Markdown, for example, works well with fenced blocks. RST uses a code directive that leads to an extra level of indentation that needs to be carefully excised. Futher, most markup languages have a mountain of cruft that goes around the content. This is unpleasantly complex, and very much subject to odd little changes that don't track against the content, but are part of the evolution of the markup language.

My going-in assumption on tangling was the document contained all the code. All of it. Without question or exception. For C/C++ this means all the fiddly little pre-processor directives that add no semantic clarity yet must be in the code file. This means the preprocessor nonsense had to be relegated to an appendix of "yet more code that just has to be there."

After writing a tangler to pull code from a book into a variety of contexts, I'm thinking I need to have a tangler that works with a template engine. I think there would be the following two use cases:

  • No-Template Case. The WEB source is complete. This works well for a lot of languages that don't have the kind of cruft that C/C++ has. It generally means a WEB source document will contain definition(s) for the final code file(s) as a bunch of references to the previously-explained bits. For C/C++, this final presentation can include the fiddly bits of preprocessor cruft.
  • Template Case. A template is used to with the source to create the tangled output. This is what I have now for pulling book content into a context where it is testable. For the most part, the template files are quite small because the book includes test cases in the form of REPL blocks. This presents a bit of a problem because it breaks the "all in one place" principle of a WEB project. I have a WEB source file with the visible content plus one or more templates with invisible content.

What I like about this is an attempt to reduce some of the cruftiness of the various tools. 

I think my py-web-tool might be expanded to handle my expanded understanding of literate programming. 

I have a book to finish, first, though. Then I can look at improving my workflow. (And yes, this is backwards from a properly Agile approach.)

Tuesday, April 12, 2022

Pelican and Static Web Content

In Static Site Blues I was wringing my hands over ways to convert a ton of content from a two different proprietary tools (the very old iWeb, and the merely old Sandvox) into something I could work with.

After a bit of fiddling around, I'm delighted with Pelican.

First, of course, I had to extract all the iWeb and Sandvox content. This was emphatically not fun. While both used XML, they used it in subtly different ways. Apple's frameworks serialize internal state as XML in a way that preserves a lot of semantic details. It also preserves endless irrelevant details.

I wound up with a Markdown data structure definition, plus a higher-level "content model" with sites, pages, blogs, blog entries and images. Plus the iWeb extractor and the Sandvox extractor. It's a lot of code, much of which lacks solid unit test cases. It worked -- once -- and I was tolerant of the results.

I also wound up writing tools to walk the resulting tree of Markdown files doing some post-extraction cleanup. There's a lot of cleanup that should be done.

But.

I can now add to the blog with the state of my voyaging. I've been able to keep Team Red Cruising up to date.

Eventually (i.e., when the boat is laid up for Hurricane Season) I may make an effort to clean up the older content and make it more consistent. In particular, I need to add some annotations around anchorages to make it possible to locate all of the legs of all of the journeys. Since the HTML is what most people can see, that means a class identifier for lat-lon pairs. 

As it is, the blog entries are *mostly* markdown. Getting images and blockquotes even close to readable requires dropping to HTML to make direct use of the bootstrap CSS. This also requires some comprehensive cleanup to properly use the Bootstrap classes. (I think I've may have introduced some misspelled CSS classes into the HTML that aren't doing anything.)

For now, however, it works. I'm still tweaking small things that require republishing *all* the HTML. 

Tuesday, March 1, 2022

Static Site Blues

I have a very large, static site with 10+ years of stuff about my boat. Most of it is pretty boring. http://www.itmaybeahack.com/TeamRedCruising/

I started with iWeb. It was very -- well -- 2000-ish look and feel. Too many pastels and lines and borders.

In 2012, I switched to Sandvox. I lived on a boat back then. I don't have reliable internet. Using blogger.com, for example, required a sincere commitment to bandwidth. I moved ashore in 2014 and returned to the boat in 2020.

Sandvox's creator seems to be out-of-business.

What's next?

Give up on these fancy editors and switch to a static site generator. Write markdown. Run the tool. Upload when in a coffee shop with Wi-Fi. 

What site generator?

See https://www.fullstackpython.com/static-site-generator.html for some suggestions.

There are three parts to this effort.

  1. Extract the goodness from iWeb and Sandvox. I knew this would be real work. iWeb's site has too much javascript to be easy-to-parse. I have to navigate the underlying XML database. Sandvox is much easier to deal with: their published site is clean, static HTML with useful classes and ids in their tags.
  2. Reformat the source material into Markdown. I've grudgingly grown to accept Markdown, even through RST is clearly superior. Some tools work with RST and I may pandoc the entire thing over to RST from Markdown. For now, though, the content seems to be captured.
  3. Fixup internal links and cross references. This is a godawful problem. Media links -- in particular -- seem to be a nightmare. Since iWeb resolves things via Javascript, the HTML is opaque.  Fortunately, the database's internal cross-references aren't horrible. Maybe this was exacerbated a poor choice of generators. 
  4. Convert to HTML for a local server. Validate.
  5. Convert to HTML for the target server. Upload to a staging server and validate again. This requires a coffee shop. Not doing this with my phone's data plan.

Steps 1 and 2 aren't too bad. I've extracted serviceable markdown from the iWeb database and the published Sandvox site. The material parallels the Site/Blog/Page structure of the originals. The markdown seems to be mostly error-free. (Some images have the caption in the wrong place, ![caption](link) isn't as memorable as I'd like.) 

Step 3, the internal links and cross-references, has been a difficult problem, it turns out. I can, mostly, associate media with postings. I can also find all the cross-references among postings and fix those up. The question that arises is how to reference media from a blog post?

Mynt

I started with mynt. And had to bail. It's clever and very simple. Too simple for blog posts that have a lot of associated media assets.

The issue is what to write in the markdown to refer to the images that go with a specific blog post. I resorted to a master _Media directory. Which means each posting has ![caption][../../../../_Media/image.png) in it.  This is semi-manageable. But exasperating in bulk. 

What scrambled my brain is the way a mynt posting becomes a directory, with an index.html. Clearly, the media could be adjacent to the index.html. But. I can't figure out how to get mynt's generator to put the media into each post's published directory. It seems like each post should not be a markdown file. 

Also, I can trivially change the base URL when generating, but I can't change the domain. When I publish, I want to swap domains *only*, leaving the base URL alone. I tried. It's too much fooling around.

Pelican

Next up. Pelican. We'll see if I can get my media and blog posts neatly organized. This http://chdoig.github.io/create-pelican-blog.html seems encouraging. I think I should have started here first. Lektor is another possibility.

Since my legacy sites have RSS feeds, it may be sensible to turn Pelican loose on the RSS and (perhaps) skip steps 1, 2, and 3, entirely.


Tuesday, February 15, 2022

LaTeX Mysteries and an algorithmicx thing I learned.

I've been an on-and-off user of LaTeX since the very, very beginning. Back in the dark days when the one laser printer that could render the images was in a closely-guarded secret location to prevent everyone from using it and exhausting the (expensive) toner cartridges.

A consequence of this is I think the various algorithm environments are a ton of fun. Pseudo-code with math embedded in it. It's marvelous. It's a pain in the neck with this clunky blogging package, so I can't easily show off the coolness. But. You can go to https://www.overleaf.com/learn/latex/Algorithms to see some examples.

None of which have try/except blocks. Not a thing.

Why not? I suspect it's because "algorithmic" meant "Algol-60" for years. The language didn't have exceptions and so, the presentation of algorithms continues to this day without exceptions. 

What can one do?

This.

\algblock{Try}{EndTry}
\algcblock[Try]{Try}{Except}{EndTry}
\algcblockdefx{Try}{Except}{EndTry}
   [1][Exception]{\textbf{except} \texttt{#1}}

\algrenewtext{Try}{\textbf{try}}

This will extend the notation to add \Try, \Except, and \EndTry commands. I think I've done it all more-or-less correctly. I'm vague on where the \algnotext{EndTry} goes, but it seems to be needed in each \Try block to silence the \EndTry.

As far as I know, I'm the only person who seems to care. There seems to be little about this anywhere online. I'm guessing it's because the basics work perfectly, and no one wants this kind of weird add-on.

Tuesday, February 8, 2022

Desktop Notifications and EPIC DESIGN FAIL

I was asked to review code that -- well -- was evil.

Not like "shabby" or "non-pythonic". Nothing so simple as that.

We'll get to the evil in a moment. First, we have to suffer two horrible indignities.

1. Busy Waiting

2. Undefined Post-Conditions.

We'll beat all three issues to death separately, starting with busy waiting.

Busy Waiting

The Busy Waiting is a sleep-loop. If you're not familiar, it's this:

while something has not happened yet AND we haven't timed out:
    time.sleep(2)
    

Which is often a dumb design. Busy waiting is polling. It's a lot of pointless doing something while waiting for something else.

There are dozens of message-passing and event-passing frameworks. Any of those is better than this.

Folks complain "Why install ZMQ when I could instead write a busy-waiting loop?"

Why indeed?

For me, the primary reason is to avoid polling at fixed intervals, and instead wait for the notification. 

The asyncio module, confusing as it is, is better than polling. Because it dispatches events properly.

This is minor compared with the undefined post-conditions.

Undefined Post-Conditions

With this crap design, there are two events. There's a race between them. One will win. The other will be silently lost forever.

If "something has not happened" is false, the thing has happened. Yay. The while statement ends.

If "something has not happened" is true and the timeout occurs, then Boo. The while statement ends.

Note the there are two, unrelated post-conditions: the thing has happened OR the timeout occurred. Is it possible for both to happen? (hint: yes.)

Ideally, the timeout and the thing happening are well-separated in time.

Heh.

Otherwise, they're coincident, and it's a coin-toss as to which one will lead to completion of the while statement. 

The code I was asked to review made no provision for this unhappy coincidence. 

Which leads us to the pure evil.

Pure Evil

What's pure evil about this is the very clear statement that there are not enough desktop notification apps, and there's a need for another.

I asked for justification. Got a stony silence.

They might claim "It's only a little script that runs in the Terminal Window," which is garbage. There are already lots and lots of desktop apps looking for asynchronous notification of events.

Email is one of them.

Do we really need another email-like message queue?  

(Hint: "My email is a lot of junk I ignore" is a personal problem, not a software product description. Consider learning how to create filters before writing yet another desktop app.)

Some enterprises use Slack for notifications. 

What makes it even worse (I said it was pure evil) was a hint about the context. They were doing batch data prep for some kind of analytics/Machine Learning thing. 

They were writing this as if Luigi and related Workflow managers didn't exist.

Did they not know? If they were going to invent their own, they were off to a really bad start. Really bad.

Tuesday, January 25, 2022

No one wins at Code Golf vs. This is more noise than signal

Looking at code. Came to a 20-line block of code that did exactly this.

sorted(Path.cwd().glob("some_pattern[1-9]*.*"), reverse=True)

Twenty lines. Seriously. 

To be fair, 8 of the 20 lines were comments. 3 were blank. Which leaves 9 lines of code to perform the task of a one-liner.

I often say "no one wins at code golf" as a way to talk people out of trying to minimize Python code into vanishingly small black holes where no information about the code's design escapes.

However. Blowing a line of code into 9 lines seems to be just as bad. 

I'll spare you the 9 lines. I will say this, though, the author was blissfully ignorant that Path objects are comparable. So. There were needless conversions. And. Even after commenting on this, they seemed to somehow feel (without evidence of any kind) that Path objects were incomparable.

This is not the first time I've seen folks who like assembler-style code. There is at most one state-change or attribute reference on each line of code. The code has a very voluble verticality (VVV™).

This seems as wrong as code golf.  Neither style provides meaningful code. 

How can we measure "meaningful"?

Of the 8 lines of comments, the English summary, the "reverse alphabetic order" phrase is only a few words. Therefore, the matching code can be an equally terse few symbols. I think code can parallel natural language.

Tuesday, January 18, 2022

How to Test a Random Number Generator

Nowadays, we don't have the same compelling reasons to test a random number generator. The intervening decades have seen a lot of fruitful research. Good algorithms.

Looking back to my 1968 self, however, I still feel a need to work out the solution to an old problem. See The Old Days -- ca. 1968 for some background on this.

What could I have done on that ancient NCE Fortran -- with four digit integers -- to create random numbers? Step 1 was to stop using the middle-squared generator. It doesn't work.

Step 2 is to find a Linear Congruential Generator that works. LCG's have a (relatively) simple form:

\[X_{n+1} = (X_n \times a + c) \bmod m\]

In this case, the modulo value, m, is 10,000. What's left is step 3: find a and c parameters.

To find suitable parameters, we need battery of empirical tests. Most of them are extensions to the following class:

from collections import Counter
from typing import Hashable
from functools import cache

class Chi2Test:
    """The base class for empirical PRNG tests based on the Chi-2 testing."""
    
    #: The actual distribution, created by ``test()``.
    actual_fq : dict[Hashable, int]
    
    #: The expected distribution, created by ``__init__()``.
    expected_fq: dict[Hashable, int]
    
    #: The lower and upper bound on acceptable chi-squared values.
    expected_chi_2_range: tuple[float, float]
    
    def __init__(self):
        """
        A subclass will override this to call ``super().__init__()`` and then
        create the expected distribution.
        """
        self._chi2 = None
    
    def test(self):
        """
        A subclass will override this to call ``super().test()`` and then
        create an actual distribution, usually with a distinct seed value.
        """
        self._chi2 = None
        
    @property
    def chi2(self) -> float:
        """Return chi-squared metric between actual and expected observations."""
        if self._chi2 is None:
            a_e = (
                (self.actual_fq[k], self.expected_fq[k]) 
                for k in self.expected_fq 
                if self.expected_fq[k] > 0
            )
            v = sum((a-e)**2/e for a, e in a_e)
            self._chi2 = v
        return self._chi2

    @property
    def pass_test(self) -> bool:
        return self.expected_chi_2_range[0] <= self.chi2 <= self.expected_chi_2_range[1]

This defines the essence of a chi-squared test. There's another test that isn't based on chi-squared. The serial correlation where a correlation coefficient is computed between adjacent pairs of samples. We'll ignore this special case for now. Instead, we'll focus on the battery of chi-squared tests. 

Linear Congruential Pseudo-Random Number Generator

We'll also need an LC PRNG that's constrained to 4 decimal digits.

It looks like this:

class LCM4:
    """Constrained by the NCE Fortran 4-digit integer type."""
    def __init__(self, a: int, c: int) -> None:
        self.a = a
        self.c = c
    def seed(self, v: int) -> None:
        self.v = v
    def random(self) -> int:
        self.v = (self.a*self.v % 10_000 + self.c) % 10_000
        return self.v

This mirrors the old NCE Fortran on the IBM 1620 computer. 4 decimal digits. No more. 

We can use this to generate a pile of samples that can be evaluated. I'm a fan of using generators because they're so efficient. The use of a set to create a list seems weird, but it's very fast.

def lcg_samples(rng: LCM4, seed: int, n_samples: int = N_SAMPLES) -> list[int]:
    """
    Generate a bunch of sample values. A repeat implies a cycle, and we'll stop early.

    >>> lcg_samples(LCM4(1621, 3), 1234)[:12]
    [317, 3860, 7063, 9126, 3249, 6632, 475, 9978, 4341, 6764, 4447, 8590]

    """
    rng.seed(seed)
    def until_dup(f: Callable[..., Hashable], n_samples: int) -> Iterator[Hashable]:
        seen: set[Hashable] = set()
        while (v := f()) not in seen and len(seen) < n_samples:
            seen.add(v)
            yield v
    return list(until_dup(rng.random, n_samples))

This function builds a list of values for us. We can then subject the set of samples to a battery of tests. We'll look at one test as an example for the others. They're each devilishy clever, and require a little bit of coding smarts to get them to work correctly and quickly.

Frequency Test

Here's one of the tests in the battery of chi-squared tests. This is the frequency test that examines values to see if they have the right number of occurrences. We pick a domain, d, and parcel numbers out into this domain. We use \(\frac{d \times X_{n}}{10,000}\) because this tends to leverage the left-most digits which are somewhat more random than the right-most digits.

class FQTest(Chi2Test):
    expected_chi_2_range = (7.261, 25.00)

    def __init__(self, d: int = 16, size_samples: int = 6_400) -> None:
        super().__init__()
        #: Size of the domain
        self.d = d
        #: Number of samples expected
        self.size_samples = size_samples
        #: Frequency for Chi-squared comparison
        self.expected_fq = {e: int(self.size_samples/self.d) for e in range(self.d)}
    
    def test(self, sequence: list[int]) -> None:
        super().test()
        self.actual_fq = Counter(int(self.d*s/10_000) for s in sequence)

We can apply this test to some samples, compare with the expectation, and save the chi-squared value. This lets us look at LCM parameters to see if the generator creates suitably random values.

The essential test protocol is this:

samples = lcg_samples(LCM4(1621, 3), seed=1234)
fqt = FQTest()
fqt.test(samples)
fqt.chi2

The test creates some samples, applies the frequency test. The next step is to examine the chi-squared value to see if it's in the allowable range, \(7.261 \leq \chi^2 < 25\).

The search space

Superficially, it seems like there could be 10,000 choices of a and 10,000 choices of c parameter values for this PRNG. That's 100 million combinations. It takes a bit of processing to look at all of those. 

Looking more deeply, the values of c are often small prime numbers. 1 or 11 or some such. That really cuts down on the search. The values of a have a number of other constraints with respect to the modulo value. Because 10,000 has factors of 4 and 5, this suggests values like \(20k + 1\) will work. Sensible combinations are defined by the following domain:

combinations = [
    (a, c)
    for c in (1, 3, 7, 11,)
    for a in range(21, 10_000, 20)
]

This is 2,000 distinct combinations, something we can compute on our laptop. 

The problem we have trying to evaluate these is each combination's testing is compute-intensive. This means we want to use as many cores of our machine as we have available. We don't want this to process each combination serially on a single core. A thread pool isn't going to help much because the OS doesn't scatter threads among all the cores. 

Because the OS likes to scatter processes among all the cores, we need a process pool.

Here's how to spread the work among the cores:

    from concurrent.futures import ProcessPoolExecutor, as_completed

    combinations = [
        (a, c)
        for c in (1, 3, 7, 11)
        for a in range(21, 10_000, 20)
    ]

    with Progress() as progress:
        setup_task = progress.add_task("setup ...", total=len(combinations))
        finish_task = progress.add_task("finish...", total=len(combinations))

        with ProcessPoolExecutor(max_workers=8) as pool:
            futures = [
                pool.submit(evaluate, (a, c))
                for a, c in progress.track(combinations, task_id=setup_task, total=len(combinations))
            ]
            results = [
                f.result()
                for f in progress.track(as_completed(futures), task_id=finish_task, total=len(combinations))
            ]

This will occupy *all* the cores of the computer executing the `evaluate()` function. This function applies the battery of tests to each combination of a and c. We can then check the results for combinations where the chi-squared results for each test are in the acceptable ranges for the test.

It's fun.

TL;DR

Use a=1621 and c=3 can generate acceptable random numbers using 4 decimal digits.

Here's some output using only a subset of the tests.

(rngtest2) % python lcmfinder.py
setup ... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
finish... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2361  1  11.46  14.22  46.64  63.76   2.30  11.33   2.16   2.16 
 981  3  10.28  15.24  52.56  66.32   2.28  11.08  10.47  10.47 
1221  3  10.19  14.12  48.72  62.08   3.03  10.08   2.59   2.59 
1621  3  11.70  14.91  47.12  69.52   2.23   9.69   0.86   0.86 

The output shows the a and c values followed by the minimum and maximum chi-squared values for each test. The chi-squared values are in pairs for the frequency test, serial pairs test, gap test, and poker test. 

Each test uses about two dozen seed values to generate piles of 3,200 samples and subject each pile of samples to a battery of tests. The seed values, BTW, are range(1, 256, 11); kind of arbitrary. Once I find the short list of candidates, I can test with more seeds. There are only 10,000 seed values, so, this can be done in finite time.

For example, a=1621, c=3, had chi-squared values between 11.70 and 14.91 for the frequency test. Well within the 7.261 to 25.0 range required. The remaining numbers show that it passed the other tests, also.

For completeness, I intend to implement the remaining half-dozen or so tests. Then I need to make sure the sphinx-produced documentation looks good. I've done this before. http://slott.itmaybeahack.com/_static/rngtest/rngdoc.html It's kind of an obsession, I think.

Looking back to my 1968 self, this would have been better than the middle-squared nonsense that caused me to struggle with bad games that behaved badly.

Tuesday, January 11, 2022

The Old Days -- ca. 2000 -- Empirical Tests of Random Numbers (Python and Chi-Square Testing)

See The Old Days -- ca. 1974 Random Numbers Before Python for some background.

We'll get to Python after reminiscing about the olden days. I want to provide some back story on why sympy has had a huge impact on ordinary hacks like myself.

What we're talking about is how we struggled with randomness before

  1. /dev/random
  2. The Mersenne Twister Pseudo-Random Number Generator (PRNG)

Pre-1997, we performed empirical tests of PRNG's to find one that was random enough for our application. Maybe we were doing random samples of data to compare statistical measures. Maybe we were writing a game. What was important was a way to create a sequence of values that passed a battery of statistical tests.

See https://link.springer.com/chapter/10.1007%2F978-1-4612-1690-2_7 for the kind of material we salivated over. 

While there are an infinite number of bad algorithms, some math reveals that the Linear Congruential Generator (LCG) is simple and effective. Each new number is based on the previous number: \(X_{n+1} = (X_n \times a + c) \bmod m\). There's a multiply and an add, modulo some big number. The actual samples are often a subset of the bits in \(X_{n}\). 

After the Mersenne Twister became widely used, we essentially stopped looking at alternative random number algorithms. Before then -- well -- things weren't so good.

Here are some classics that I tested.

  • The ACM Collected Algorithms (CALGO) number 294 is a random-number generator. This is so obsolete, I have trouble finding links to it. It was a 28-bit generator.
  • The ACM Collected Algorithms (CALGO) number 266 has code still available. See toms/266
  • The Cheney-Kincaid generator is available. See random.f plus dependencies.

These formed a kind of benchmark I used when looking at Python's built-in Mersenne Twister.

Nowadays, you can find a great list of LCM PRNG's at  https://en.wikipedia.org/wiki/Linear_congruential_generator

Python Empirical Testing

One of the early questions I had was whether or not the random module in Python stacked up against these older RNG's that I was a little more familiar with.

So, I wrote a big, fancy random number testing tool in Python. 

When? Around 2000. I started this in the Python 1.6 and 2.1 era. I have files showing results from Python 2.3 (#2, Jul 30 2003). This is about when I stopped fooling around with this and moved on to trusting that Python really did work and was -- perhaps -- the best approach to working with randomly-sampled data for statistical work. 

The OO design for the test classes was Lavish Over The Top (LOTT™) OO:

  • Too Many Methods
  • Too Many Superclasses
  • No Duck Typing

We won't look at that code. It's regrettable and stems from trying to make Python into C++.

What I do want to look at is the essential Chi-Squared test methodology. This is some cool stuff.

Comparing Expected and Actual

The chi-squared metric is a way to compare actual and expected distributions. You can read about it on your own time. It's a way to establish if data is random or there's something else going on that's not random. i.e., a trend or a bias. 

The empirical tests for PRNG's that Knuth defines all come with chi-squared values that bracket acceptable levels of randomness. For the purposes of writing a working set of tests the magic chi-squared values supplied by Knuth are fine. Magical. But fine. Really. Trust them.

If you make modifications, you'd use your statistics text-book. You'd open to the back where it had a Chi-Squared table. That table gave you chi-squared values for a given degree of freedom and a given probability of being random.

Or, You could look for the NIST handbook online. It has a section on chi-squared testing. See https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm. Same drill. Degrees of freedom and probability map to a chi-squared threshold.

But.

Were do these magical Chi-Squared values come from? This gets interesting in a useless-sidebar kind of way.

Chi-Squared Values

There's a really, really terse summary of chi-squared numbers here: https://www.danielsoper.com/statcalc/formulas.aspx?id=11. This is all you need to know. It may be too terse to help you learn about it, but it's a handy reference.

We need to evaluate two functions: partial gamma and gamma. These are defined as integrals. And they're nasty levels of complexity. Nasty.

This kind of nasty:

\[\gamma (s,z)=\int _{0}^{z}t^{s-1} e^{-t} dt\].

\[\Gamma (z)=\int _{0}^{\infty} t^{z-1} e^{-t} dt\].

These are not easy things to evaluate. Back to the ACM Collected Algorithms (CALGO) to find ways to evaluate these integrals. There are algorithms in CALGO 435 and 654 that are expressed as Fortran for evaluating these. This ain't all, of course, we need Stirling Numbers and Bernoulli Numbers. So there's a lot going on here.

A lot of this can be transliterated from Fortran. The resulting code is frankly quite ugly, and requires extensive test cases. Fortran with GOTO's requires some cleverness to unwind the conceptual for/while/if constructs.

OR.

Enter Sympy

In the 20+ years since I implemented my empirical PRNG tests "the hard way," sympy has come of age.

Check this out

from sympy import Sum, rf
from sympy.abc import k, s, z
from sympy.functions import exp
from sympy import oo
Sum(z**s * exp(-z) * z**k / rf(s, k+1), (k, 0, oo)).simplify()

I could use this in Jupyter Lab to display a computation for the partial gamma function.

\[z^{s}e^{-z}\sum _{k=0}^{\infty }{\dfrac {z^{k}}{s^{\overline {k+1}}}}\]

This requires a fancy Rising Factorial computation, the \(s^{\overline {k+1}}\) term. This is available in sympy as the rf(s, k+1) expression.

It turns out that sympy offers lowergamma() and gamm() as first-class functions. I don't even need to work through the closed-form simplifications.

I could do this...

def gammap(s: float, z: float) -> float:
    return (z**s * exp(-z) * Sum(z**k / rf(s, k+1), (k, 0, oo))).evalf()

def gamma(z: float) -> float:
    return integrate(t**(z-1) * exp(-t), (t, 0, oo)).doit()

It works well. And it provides elegant documentation. But I don't need to. I can write this, instead,

def chi2P(chi2: float, degF: int) -> float:
   return lowergamma(degF/2, chi2/2) / gamma(degF/2)

This is used to compute the probability of seeing a chi-squared value. 

For the frequency test, as an example. We partition the random numbers into 16 bins. These gives us 15 degrees of freedom. We want chi-squared values between 7.2578125 and 25.0.

Or.

Given a chi-squared value of 6.0, we can say the probability of 0.02 is suspiciously low, less than 0.05 level that we've decided signifies mostly random. The data is "too random"; that is to say it's too close to the ideal distribution to be trusted.

The established practice was to lookup a chi-squared value because you couldn't easily compute the probability of that value. With sympy, we can compute the probability. It's slow, so we have to optimize this carefully and not compute probabilities more frequently than necessary.

We can, for example, compute chi-squared values for a number of seeds, take the max and min of these and compute the probability of those two boundary values. This will bracket the probability that the pseudo random number generator is producing suitably random numbers.

This also applies to any process we're measuring with results that might vary randomly or might indicate a consistent problem that requires evaluation.

Using sympy eliminates the complexity of understanding these beautifully hand-crafted antique algorithms. It acts as a kind of super-compiler. From Math to an intermediate AST to a concrete implementation.

Tuesday, January 4, 2022

The Old Days -- ca. 1974 -- Random Numbers before Python

See "The Old Old Days -- ca. 1968" for my first exposure to an actual computer. Nothing about Python there. But. It's what motivated me to get started learning to code -- I was fascinated by games that involved randomization. Games with cards or dice.

After filling in a little background, I will get to the Python part of this. First, however, I want to compare the olden days with what we have now.

From 1969 to 1974 I had access to the high school's IBM 1620. This means programming in IBM's SPS assembler, or using the NCE Load-and-Go Fortran compiler. See https://www.cs.utexas.edu/users/EWD/transcriptions/EWD00xx/EWD37.html for a scathing review of the problems with this machine.

See http://www.bitsavers.org/pdf/ibm/1620/GC20-1603-10_1620_Catalog_of_Programs_Jan71.pdf Page 36 has this:

That's a quick overview of my earliest programming language. What's essential here is the NCE Fortran used 4-digit integers.

I'll repeat that for those skimming, and wondering what the Python connection is.

Four. Digit. Integers.

That's four decimal digits. Decimal digits required at least 4 hardware bits. IBM 1620 digits also had flags and signs, so, there were maybe 6 bits per digit. 24 bits of hardware used to represent just under 14 bits of useful information.

My interest is in simulation and randomness. So. I have this question of how to create random sequences of numbers limited to 4-digit integers.

PRNG Algorithms

There are a number of classic Pseudo-Random Number Generator (PRNG) algorithms from the early days before Mersenne Twister took over in 1997.

We used to be super-careful to emphasize the letter P in PRNG because the numbers we're really random. They just behaved randomish. This is contrasted with real randomness, also known as entropy. For example, /dev/random device driver has a fair amount of entropy. I think it's comparable to a person throwing dice across a table. I think it's as random as a noise-generating diode with a sample-and-hold circuit to pluck out random values from the noise.

Pre-Mersenne-Twister -- pre-1997 -- we worried a lot about random number generation. See Knuth, Donald E. The Art of Computer Programming, Volume 2, Seminumerical Algorithms, Addison-Wesley, 1969. Section 3.3.2. covers empirical testing of random number generators. Section 3.3.1. covers the Chi-squared test for fit between actual and expected frequency distributions.

Back in the olden days, it was stylish to perform an empirical test (or ten) to confirm we really had "good" random numbers. The built-in libraries that came with your compiler or OS could not be trusted without evidence.

One of the classic (bad) PRNG's is the "Middle-Squared" method. See https://en.wikipedia.org/wiki/Middle-square_method. I learned about this in the 70's. And used it in the old NCE Fortran. 

With Four. Digit. Integers.

Did I mention that the Fortran compiler used four decimal digits for integers? That means plucking the middle two digits out of a four-digit number. How random can that be?

Not very. The longest possible sequence is 100 numbers. If, by some miracle, you found a seed number with the right properties and only two digits.

Nowadays I can, in Python, do a quick middle-squared analysis for all 100 seed values.

This kind of thing.

def csqr4(value: int) -> list[int]:
    """The 4 decimal digit center-squared PRNG."""
    sequence = []
    while value not in sequence:
        sequence.append(value)
        value = (((value**2) // 10) % 100)
    return sequence

Which you can run and see that all of my early attempts at games and simulations were doomed. The seed values of 76, 42, and 69 provided kind of long sequences of almost random-seeming numbers. Otherwise, pfft, this was junk computer science. 50% of the seeds provide 5 or fewer numbers before repeating. 

For blackjack, a few random numbers for shuffling might be enough. For other games, the lack of randomness made the outcomes trivially predictable. 

What's funny is how far the state of the art has moved since then.

  1. Hardware now has more than 20,000 decimal digits (about 10K bytes) of storage.
  2. Software with algorithms that are really, really clever.

It's hard to understate these two advances, particularly, the second one. I'll return to the algorithm thing a lot in the next few posts.

My focus was on games and randomization. Ideally, simple stuff. But... under the hood, it's not simple. I've spent some time (not much, and not in much depth) looking at the tip of this iceberg. 

It served me as an incentive to dive just a little more deeply into a topic, like math or a programming language or a statistical tool.