Tuesday, April 12, 2022

Pelican and Static Web Content

In Static Site Blues I was wringing my hands over ways to convert a ton of content from a two different proprietary tools (the very old iWeb, and the merely old Sandvox) into something I could work with.

After a bit of fiddling around, I'm delighted with Pelican.

First, of course, I had to extract all the iWeb and Sandvox content. This was emphatically not fun. While both used XML, they used it in subtly different ways. Apple's frameworks serialize internal state as XML in a way that preserves a lot of semantic details. It also preserves endless irrelevant details.

I wound up with a Markdown data structure definition, plus a higher-level "content model" with sites, pages, blogs, blog entries and images. Plus the iWeb extractor and the Sandvox extractor. It's a lot of code, much of which lacks solid unit test cases. It worked -- once -- and I was tolerant of the results.

I also wound up writing tools to walk the resulting tree of Markdown files doing some post-extraction cleanup. There's a lot of cleanup that should be done.


I can now add to the blog with the state of my voyaging. I've been able to keep Team Red Cruising up to date.

Eventually (i.e., when the boat is laid up for Hurricane Season) I may make an effort to clean up the older content and make it more consistent. In particular, I need to add some annotations around anchorages to make it possible to locate all of the legs of all of the journeys. Since the HTML is what most people can see, that means a class identifier for lat-lon pairs. 

As it is, the blog entries are *mostly* markdown. Getting images and blockquotes even close to readable requires dropping to HTML to make direct use of the bootstrap CSS. This also requires some comprehensive cleanup to properly use the Bootstrap classes. (I think I've may have introduced some misspelled CSS classes into the HTML that aren't doing anything.)

For now, however, it works. I'm still tweaking small things that require republishing *all* the HTML. 

Tuesday, March 1, 2022

Static Site Blues

I have a very large, static site with 10+ years of stuff about my boat. Most of it is pretty boring. http://www.itmaybeahack.com/TeamRedCruising/

I started with iWeb. It was very -- well -- 2000-ish look and feel. Too many pastels and lines and borders.

In 2012, I switched to Sandvox. I lived on a boat back then. I don't have reliable internet. Using blogger.com, for example, required a sincere commitment to bandwidth. I moved ashore in 2014 and returned to the boat in 2020.

Sandvox's creator seems to be out-of-business.

What's next?

Give up on these fancy editors and switch to a static site generator. Write markdown. Run the tool. Upload when in a coffee shop with Wi-Fi. 

What site generator?

See https://www.fullstackpython.com/static-site-generator.html for some suggestions.

There are three parts to this effort.

  1. Extract the goodness from iWeb and Sandvox. I knew this would be real work. iWeb's site has too much javascript to be easy-to-parse. I have to navigate the underlying XML database. Sandvox is much easier to deal with: their published site is clean, static HTML with useful classes and ids in their tags.
  2. Reformat the source material into Markdown. I've grudgingly grown to accept Markdown, even through RST is clearly superior. Some tools work with RST and I may pandoc the entire thing over to RST from Markdown. For now, though, the content seems to be captured.
  3. Fixup internal links and cross references. This is a godawful problem. Media links -- in particular -- seem to be a nightmare. Since iWeb resolves things via Javascript, the HTML is opaque.  Fortunately, the database's internal cross-references aren't horrible. Maybe this was exacerbated a poor choice of generators. 
  4. Convert to HTML for a local server. Validate.
  5. Convert to HTML for the target server. Upload to a staging server and validate again. This requires a coffee shop. Not doing this with my phone's data plan.

Steps 1 and 2 aren't too bad. I've extracted serviceable markdown from the iWeb database and the published Sandvox site. The material parallels the Site/Blog/Page structure of the originals. The markdown seems to be mostly error-free. (Some images have the caption in the wrong place, ![caption](link) isn't as memorable as I'd like.) 

Step 3, the internal links and cross-references, has been a difficult problem, it turns out. I can, mostly, associate media with postings. I can also find all the cross-references among postings and fix those up. The question that arises is how to reference media from a blog post?


I started with mynt. And had to bail. It's clever and very simple. Too simple for blog posts that have a lot of associated media assets.

The issue is what to write in the markdown to refer to the images that go with a specific blog post. I resorted to a master _Media directory. Which means each posting has ![caption][../../../../_Media/image.png) in it.  This is semi-manageable. But exasperating in bulk. 

What scrambled my brain is the way a mynt posting becomes a directory, with an index.html. Clearly, the media could be adjacent to the index.html. But. I can't figure out how to get mynt's generator to put the media into each post's published directory. It seems like each post should not be a markdown file. 

Also, I can trivially change the base URL when generating, but I can't change the domain. When I publish, I want to swap domains *only*, leaving the base URL alone. I tried. It's too much fooling around.


Next up. Pelican. We'll see if I can get my media and blog posts neatly organized. This http://chdoig.github.io/create-pelican-blog.html seems encouraging. I think I should have started here first. Lektor is another possibility.

Since my legacy sites have RSS feeds, it may be sensible to turn Pelican loose on the RSS and (perhaps) skip steps 1, 2, and 3, entirely.

Tuesday, February 15, 2022

LaTeX Mysteries and an algorithmicx thing I learned.

I've been an on-and-off user of LaTeX since the very, very beginning. Back in the dark days when the one laser printer that could render the images was in a closely-guarded secret location to prevent everyone from using it and exhausting the (expensive) toner cartridges.

A consequence of this is I think the various algorithm environments are a ton of fun. Pseudo-code with math embedded in it. It's marvelous. It's a pain in the neck with this clunky blogging package, so I can't easily show off the coolness. But. You can go to https://www.overleaf.com/learn/latex/Algorithms to see some examples.

None of which have try/except blocks. Not a thing.

Why not? I suspect it's because "algorithmic" meant "Algol-60" for years. The language didn't have exceptions and so, the presentation of algorithms continues to this day without exceptions. 

What can one do?


   [1][Exception]{\textbf{except} \texttt{#1}}


This will extend the notation to add \Try, \Except, and \EndTry commands. I think I've done it all more-or-less correctly. I'm vague on where the \algnotext{EndTry} goes, but it seems to be needed in each \Try block to silence the \EndTry.

As far as I know, I'm the only person who seems to care. There seems to be little about this anywhere online. I'm guessing it's because the basics work perfectly, and no one wants this kind of weird add-on.

Tuesday, February 8, 2022

Desktop Notifications and EPIC DESIGN FAIL

I was asked to review code that -- well -- was evil.

Not like "shabby" or "non-pythonic". Nothing so simple as that.

We'll get to the evil in a moment. First, we have to suffer two horrible indignities.

1. Busy Waiting

2. Undefined Post-Conditions.

We'll beat all three issues to death separately, starting with busy waiting.

Busy Waiting

The Busy Waiting is a sleep-loop. If you're not familiar, it's this:

while something has not happened yet AND we haven't timed out:

Which is often a dumb design. Busy waiting is polling. It's a lot of pointless doing something while waiting for something else.

There are dozens of message-passing and event-passing frameworks. Any of those is better than this.

Folks complain "Why install ZMQ when I could instead write a busy-waiting loop?"

Why indeed?

For me, the primary reason is to avoid polling at fixed intervals, and instead wait for the notification. 

The asyncio module, confusing as it is, is better than polling. Because it dispatches events properly.

This is minor compared with the undefined post-conditions.

Undefined Post-Conditions

With this crap design, there are two events. There's a race between them. One will win. The other will be silently lost forever.

If "something has not happened" is false, the thing has happened. Yay. The while statement ends.

If "something has not happened" is true and the timeout occurs, then Boo. The while statement ends.

Note the there are two, unrelated post-conditions: the thing has happened OR the timeout occurred. Is it possible for both to happen? (hint: yes.)

Ideally, the timeout and the thing happening are well-separated in time.


Otherwise, they're coincident, and it's a coin-toss as to which one will lead to completion of the while statement. 

The code I was asked to review made no provision for this unhappy coincidence. 

Which leads us to the pure evil.

Pure Evil

What's pure evil about this is the very clear statement that there are not enough desktop notification apps, and there's a need for another.

I asked for justification. Got a stony silence.

They might claim "It's only a little script that runs in the Terminal Window," which is garbage. There are already lots and lots of desktop apps looking for asynchronous notification of events.

Email is one of them.

Do we really need another email-like message queue?  

(Hint: "My email is a lot of junk I ignore" is a personal problem, not a software product description. Consider learning how to create filters before writing yet another desktop app.)

Some enterprises use Slack for notifications. 

What makes it even worse (I said it was pure evil) was a hint about the context. They were doing batch data prep for some kind of analytics/Machine Learning thing. 

They were writing this as if Luigi and related Workflow managers didn't exist.

Did they not know? If they were going to invent their own, they were off to a really bad start. Really bad.

Tuesday, January 25, 2022

No one wins at Code Golf vs. This is more noise than signal

Looking at code. Came to a 20-line block of code that did exactly this.

sorted(Path.cwd().glob("some_pattern[1-9]*.*"), reverse=True)

Twenty lines. Seriously. 

To be fair, 8 of the 20 lines were comments. 3 were blank. Which leaves 9 lines of code to perform the task of a one-liner.

I often say "no one wins at code golf" as a way to talk people out of trying to minimize Python code into vanishingly small black holes where no information about the code's design escapes.

However. Blowing a line of code into 9 lines seems to be just as bad. 

I'll spare you the 9 lines. I will say this, though, the author was blissfully ignorant that Path objects are comparable. So. There were needless conversions. And. Even after commenting on this, they seemed to somehow feel (without evidence of any kind) that Path objects were incomparable.

This is not the first time I've seen folks who like assembler-style code. There is at most one state-change or attribute reference on each line of code. The code has a very voluble verticality (VVV™).

This seems as wrong as code golf.  Neither style provides meaningful code. 

How can we measure "meaningful"?

Of the 8 lines of comments, the English summary, the "reverse alphabetic order" phrase is only a few words. Therefore, the matching code can be an equally terse few symbols. I think code can parallel natural language.

Tuesday, January 18, 2022

How to Test a Random Number Generator

Nowadays, we don't have the same compelling reasons to test a random number generator. The intervening decades have seen a lot of fruitful research. Good algorithms.

Looking back to my 1968 self, however, I still feel a need to work out the solution to an old problem. See The Old Days -- ca. 1968 for some background on this.

What could I have done on that ancient NCE Fortran -- with four digit integers -- to create random numbers? Step 1 was to stop using the middle-squared generator. It doesn't work.

Step 2 is to find a Linear Congruential Generator that works. LCG's have a (relatively) simple form:

\[X_{n+1} = (X_n \times a + c) \bmod m\]

In this case, the modulo value, m, is 10,000. What's left is step 3: find a and c parameters.

To find suitable parameters, we need battery of empirical tests. Most of them are extensions to the following class:

from collections import Counter
from typing import Hashable
from functools import cache

class Chi2Test:
    """The base class for empirical PRNG tests based on the Chi-2 testing."""
    #: The actual distribution, created by ``test()``.
    actual_fq : dict[Hashable, int]
    #: The expected distribution, created by ``__init__()``.
    expected_fq: dict[Hashable, int]
    #: The lower and upper bound on acceptable chi-squared values.
    expected_chi_2_range: tuple[float, float]
    def __init__(self):
        A subclass will override this to call ``super().__init__()`` and then
        create the expected distribution.
        self._chi2 = None
    def test(self):
        A subclass will override this to call ``super().test()`` and then
        create an actual distribution, usually with a distinct seed value.
        self._chi2 = None
    def chi2(self) -> float:
        """Return chi-squared metric between actual and expected observations."""
        if self._chi2 is None:
            a_e = (
                (self.actual_fq[k], self.expected_fq[k]) 
                for k in self.expected_fq 
                if self.expected_fq[k] > 0
            v = sum((a-e)**2/e for a, e in a_e)
            self._chi2 = v
        return self._chi2

    def pass_test(self) -> bool:
        return self.expected_chi_2_range[0] <= self.chi2 <= self.expected_chi_2_range[1]

This defines the essence of a chi-squared test. There's another test that isn't based on chi-squared. The serial correlation where a correlation coefficient is computed between adjacent pairs of samples. We'll ignore this special case for now. Instead, we'll focus on the battery of chi-squared tests. 

Linear Congruential Pseudo-Random Number Generator

We'll also need an LC PRNG that's constrained to 4 decimal digits.

It looks like this:

class LCM4:
    """Constrained by the NCE Fortran 4-digit integer type."""
    def __init__(self, a: int, c: int) -> None:
        self.a = a
        self.c = c
    def seed(self, v: int) -> None:
        self.v = v
    def random(self) -> int:
        self.v = (self.a*self.v % 10_000 + self.c) % 10_000
        return self.v

This mirrors the old NCE Fortran on the IBM 1620 computer. 4 decimal digits. No more. 

We can use this to generate a pile of samples that can be evaluated. I'm a fan of using generators because they're so efficient. The use of a set to create a list seems weird, but it's very fast.

def lcg_samples(rng: LCM4, seed: int, n_samples: int = N_SAMPLES) -> list[int]:
    Generate a bunch of sample values. A repeat implies a cycle, and we'll stop early.

    >>> lcg_samples(LCM4(1621, 3), 1234)[:12]
    [317, 3860, 7063, 9126, 3249, 6632, 475, 9978, 4341, 6764, 4447, 8590]

    def until_dup(f: Callable[..., Hashable], n_samples: int) -> Iterator[Hashable]:
        seen: set[Hashable] = set()
        while (v := f()) not in seen and len(seen) < n_samples:
            yield v
    return list(until_dup(rng.random, n_samples))

This function builds a list of values for us. We can then subject the set of samples to a battery of tests. We'll look at one test as an example for the others. They're each devilishy clever, and require a little bit of coding smarts to get them to work correctly and quickly.

Frequency Test

Here's one of the tests in the battery of chi-squared tests. This is the frequency test that examines values to see if they have the right number of occurrences. We pick a domain, d, and parcel numbers out into this domain. We use \(\frac{d \times X_{n}}{10,000}\) because this tends to leverage the left-most digits which are somewhat more random than the right-most digits.

class FQTest(Chi2Test):
    expected_chi_2_range = (7.261, 25.00)

    def __init__(self, d: int = 16, size_samples: int = 6_400) -> None:
        #: Size of the domain
        self.d = d
        #: Number of samples expected
        self.size_samples = size_samples
        #: Frequency for Chi-squared comparison
        self.expected_fq = {e: int(self.size_samples/self.d) for e in range(self.d)}
    def test(self, sequence: list[int]) -> None:
        self.actual_fq = Counter(int(self.d*s/10_000) for s in sequence)

We can apply this test to some samples, compare with the expectation, and save the chi-squared value. This lets us look at LCM parameters to see if the generator creates suitably random values.

The essential test protocol is this:

samples = lcg_samples(LCM4(1621, 3), seed=1234)
fqt = FQTest()

The test creates some samples, applies the frequency test. The next step is to examine the chi-squared value to see if it's in the allowable range, \(7.261 \leq \chi^2 < 25\).

The search space

Superficially, it seems like there could be 10,000 choices of a and 10,000 choices of c parameter values for this PRNG. That's 100 million combinations. It takes a bit of processing to look at all of those. 

Looking more deeply, the values of c are often small prime numbers. 1 or 11 or some such. That really cuts down on the search. The values of a have a number of other constraints with respect to the modulo value. Because 10,000 has factors of 4 and 5, this suggests values like \(20k + 1\) will work. Sensible combinations are defined by the following domain:

combinations = [
    (a, c)
    for c in (1, 3, 7, 11,)
    for a in range(21, 10_000, 20)

This is 2,000 distinct combinations, something we can compute on our laptop. 

The problem we have trying to evaluate these is each combination's testing is compute-intensive. This means we want to use as many cores of our machine as we have available. We don't want this to process each combination serially on a single core. A thread pool isn't going to help much because the OS doesn't scatter threads among all the cores. 

Because the OS likes to scatter processes among all the cores, we need a process pool.

Here's how to spread the work among the cores:

    from concurrent.futures import ProcessPoolExecutor, as_completed

    combinations = [
        (a, c)
        for c in (1, 3, 7, 11)
        for a in range(21, 10_000, 20)

    with Progress() as progress:
        setup_task = progress.add_task("setup ...", total=len(combinations))
        finish_task = progress.add_task("finish...", total=len(combinations))

        with ProcessPoolExecutor(max_workers=8) as pool:
            futures = [
                pool.submit(evaluate, (a, c))
                for a, c in progress.track(combinations, task_id=setup_task, total=len(combinations))
            results = [
                for f in progress.track(as_completed(futures), task_id=finish_task, total=len(combinations))

This will occupy *all* the cores of the computer executing the `evaluate()` function. This function applies the battery of tests to each combination of a and c. We can then check the results for combinations where the chi-squared results for each test are in the acceptable ranges for the test.

It's fun.


Use a=1621 and c=3 can generate acceptable random numbers using 4 decimal digits.

Here's some output using only a subset of the tests.

(rngtest2) % python lcmfinder.py
setup ... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
finish... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2361  1  11.46  14.22  46.64  63.76   2.30  11.33   2.16   2.16 
 981  3  10.28  15.24  52.56  66.32   2.28  11.08  10.47  10.47 
1221  3  10.19  14.12  48.72  62.08   3.03  10.08   2.59   2.59 
1621  3  11.70  14.91  47.12  69.52   2.23   9.69   0.86   0.86 

The output shows the a and c values followed by the minimum and maximum chi-squared values for each test. The chi-squared values are in pairs for the frequency test, serial pairs test, gap test, and poker test. 

Each test uses about two dozen seed values to generate piles of 3,200 samples and subject each pile of samples to a battery of tests. The seed values, BTW, are range(1, 256, 11); kind of arbitrary. Once I find the short list of candidates, I can test with more seeds. There are only 10,000 seed values, so, this can be done in finite time.

For example, a=1621, c=3, had chi-squared values between 11.70 and 14.91 for the frequency test. Well within the 7.261 to 25.0 range required. The remaining numbers show that it passed the other tests, also.

For completeness, I intend to implement the remaining half-dozen or so tests. Then I need to make sure the sphinx-produced documentation looks good. I've done this before. http://slott.itmaybeahack.com/_static/rngtest/rngdoc.html It's kind of an obsession, I think.

Looking back to my 1968 self, this would have been better than the middle-squared nonsense that caused me to struggle with bad games that behaved badly.

Tuesday, January 11, 2022

The Old Days -- ca. 2000 -- Empirical Tests of Random Numbers (Python and Chi-Square Testing)

See The Old Days -- ca. 1974 Random Numbers Before Python for some background.

We'll get to Python after reminiscing about the olden days. I want to provide some back story on why sympy has had a huge impact on ordinary hacks like myself.

What we're talking about is how we struggled with randomness before

  1. /dev/random
  2. The Mersenne Twister Pseudo-Random Number Generator (PRNG)

Pre-1997, we performed empirical tests of PRNG's to find one that was random enough for our application. Maybe we were doing random samples of data to compare statistical measures. Maybe we were writing a game. What was important was a way to create a sequence of values that passed a battery of statistical tests.

See https://link.springer.com/chapter/10.1007%2F978-1-4612-1690-2_7 for the kind of material we salivated over. 

While there are an infinite number of bad algorithms, some math reveals that the Linear Congruential Generator (LCG) is simple and effective. Each new number is based on the previous number: \(X_{n+1} = (X_n \times a + c) \bmod m\). There's a multiply and an add, modulo some big number. The actual samples are often a subset of the bits in \(X_{n}\). 

After the Mersenne Twister became widely used, we essentially stopped looking at alternative random number algorithms. Before then -- well -- things weren't so good.

Here are some classics that I tested.

  • The ACM Collected Algorithms (CALGO) number 294 is a random-number generator. This is so obsolete, I have trouble finding links to it. It was a 28-bit generator.
  • The ACM Collected Algorithms (CALGO) number 266 has code still available. See toms/266
  • The Cheney-Kincaid generator is available. See random.f plus dependencies.

These formed a kind of benchmark I used when looking at Python's built-in Mersenne Twister.

Nowadays, you can find a great list of LCM PRNG's at  https://en.wikipedia.org/wiki/Linear_congruential_generator

Python Empirical Testing

One of the early questions I had was whether or not the random module in Python stacked up against these older RNG's that I was a little more familiar with.

So, I wrote a big, fancy random number testing tool in Python. 

When? Around 2000. I started this in the Python 1.6 and 2.1 era. I have files showing results from Python 2.3 (#2, Jul 30 2003). This is about when I stopped fooling around with this and moved on to trusting that Python really did work and was -- perhaps -- the best approach to working with randomly-sampled data for statistical work. 

The OO design for the test classes was Lavish Over The Top (LOTT™) OO:

  • Too Many Methods
  • Too Many Superclasses
  • No Duck Typing

We won't look at that code. It's regrettable and stems from trying to make Python into C++.

What I do want to look at is the essential Chi-Squared test methodology. This is some cool stuff.

Comparing Expected and Actual

The chi-squared metric is a way to compare actual and expected distributions. You can read about it on your own time. It's a way to establish if data is random or there's something else going on that's not random. i.e., a trend or a bias. 

The empirical tests for PRNG's that Knuth defines all come with chi-squared values that bracket acceptable levels of randomness. For the purposes of writing a working set of tests the magic chi-squared values supplied by Knuth are fine. Magical. But fine. Really. Trust them.

If you make modifications, you'd use your statistics text-book. You'd open to the back where it had a Chi-Squared table. That table gave you chi-squared values for a given degree of freedom and a given probability of being random.

Or, You could look for the NIST handbook online. It has a section on chi-squared testing. See https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm. Same drill. Degrees of freedom and probability map to a chi-squared threshold.


Were do these magical Chi-Squared values come from? This gets interesting in a useless-sidebar kind of way.

Chi-Squared Values

There's a really, really terse summary of chi-squared numbers here: https://www.danielsoper.com/statcalc/formulas.aspx?id=11. This is all you need to know. It may be too terse to help you learn about it, but it's a handy reference.

We need to evaluate two functions: partial gamma and gamma. These are defined as integrals. And they're nasty levels of complexity. Nasty.

This kind of nasty:

\[\gamma (s,z)=\int _{0}^{z}t^{s-1} e^{-t} dt\].

\[\Gamma (z)=\int _{0}^{\infty} t^{z-1} e^{-t} dt\].

These are not easy things to evaluate. Back to the ACM Collected Algorithms (CALGO) to find ways to evaluate these integrals. There are algorithms in CALGO 435 and 654 that are expressed as Fortran for evaluating these. This ain't all, of course, we need Stirling Numbers and Bernoulli Numbers. So there's a lot going on here.

A lot of this can be transliterated from Fortran. The resulting code is frankly quite ugly, and requires extensive test cases. Fortran with GOTO's requires some cleverness to unwind the conceptual for/while/if constructs.


Enter Sympy

In the 20+ years since I implemented my empirical PRNG tests "the hard way," sympy has come of age.

Check this out

from sympy import Sum, rf
from sympy.abc import k, s, z
from sympy.functions import exp
from sympy import oo
Sum(z**s * exp(-z) * z**k / rf(s, k+1), (k, 0, oo)).simplify()

I could use this in Jupyter Lab to display a computation for the partial gamma function.

\[z^{s}e^{-z}\sum _{k=0}^{\infty }{\dfrac {z^{k}}{s^{\overline {k+1}}}}\]

This requires a fancy Rising Factorial computation, the \(s^{\overline {k+1}}\) term. This is available in sympy as the rf(s, k+1) expression.

It turns out that sympy offers lowergamma() and gamm() as first-class functions. I don't even need to work through the closed-form simplifications.

I could do this...

def gammap(s: float, z: float) -> float:
    return (z**s * exp(-z) * Sum(z**k / rf(s, k+1), (k, 0, oo))).evalf()

def gamma(z: float) -> float:
    return integrate(t**(z-1) * exp(-t), (t, 0, oo)).doit()

It works well. And it provides elegant documentation. But I don't need to. I can write this, instead,

def chi2P(chi2: float, degF: int) -> float:
   return lowergamma(degF/2, chi2/2) / gamma(degF/2)

This is used to compute the probability of seeing a chi-squared value. 

For the frequency test, as an example. We partition the random numbers into 16 bins. These gives us 15 degrees of freedom. We want chi-squared values between 7.2578125 and 25.0.


Given a chi-squared value of 6.0, we can say the probability of 0.02 is suspiciously low, less than 0.05 level that we've decided signifies mostly random. The data is "too random"; that is to say it's too close to the ideal distribution to be trusted.

The established practice was to lookup a chi-squared value because you couldn't easily compute the probability of that value. With sympy, we can compute the probability. It's slow, so we have to optimize this carefully and not compute probabilities more frequently than necessary.

We can, for example, compute chi-squared values for a number of seeds, take the max and min of these and compute the probability of those two boundary values. This will bracket the probability that the pseudo random number generator is producing suitably random numbers.

This also applies to any process we're measuring with results that might vary randomly or might indicate a consistent problem that requires evaluation.

Using sympy eliminates the complexity of understanding these beautifully hand-crafted antique algorithms. It acts as a kind of super-compiler. From Math to an intermediate AST to a concrete implementation.