Most of the things that got cut were (to me) obviously obsolete. For example, replacing collections.namedtuple with typing.NamedTuple seemed like a clear example of obsolete. A reviewer really thought I should skip all NamedTuple and use frozen data classes.
More important are some things that I learned about in my formative years. I think they're important because they'll little nuggets of cool algorithm. But. Pragmatically? They're too hard to explain and don't really capture interesting features of Python.
Back in '01. Yes. The turn of the millennium.
(Pull up a chair. This is a long yarn.)
Back in '01, I was starting to look at ways to perfect my Python and literate programming skills.
(And yes, I was using Python on '01.)
I had a project that I'd learned about in the 80's. That's in the previous millennium. A thousand years ago. Computers were large, expensive, and rare.
And. Random Number Generators (RNG's) were a bit of a struggle. In the 80's, more sensitive statistical methods were uncovering biases in the RNG's of the day. Back in the 70's, Knuth's The Art of Computer Programming, Volume 2, Seminumerical Algorithms had covered this topic pretty well. But. Not quite well enough for language libraries or OS's to offer really solid RNG's.
(The popular Mersenne Twister algorithm dates from '97.)
One of my co-workers at the time showed me a technical report that I have no real bibliographic information for. I read it, captivated, because it described -- in detail -- Knuth's statistical tests for random number generators.
This lead me to Knuth Volume 2.
This lead me to implement *all* of this in Pascal (in the '80's.)
This lead me to implement *all* of this in Python (in the '00's.)
There were 10 tests. Each is a tidy little algorithm with a tidy little implementation that can run on a big collection of data to ascertain how random it is.
- Frequency Test - develops frequency distribution of individual samples.
- Serial Test - develops frequency distribution of pairs of samples.
- Gap Test - develops frequency distribution of the length of gaps between groups samples in a given range.
- Poker Test - develops frequency distribution for 5-card "hands" of samples over a small (16-value) domain.
- Coupon Collector's Test - develops frequency distribution for lengths of subsets that contain a complete set of values from a small (8-value) domain.
- Permutation Test - develops frequency distribution for the permutations of ordering of 4-sample selections.
- Runs Up Test - develops frequency distribution for lengths of "runs up" where each value is larger than the previous value; one variation covers the case where runs are statistically dependent.
- Runs Up Test with independent runs and a relatively large domain.
- Runs Up Test with a "small domain", that has a slightly different expected distribution.
- Maximum of T - develops frequency distribution for the largest value in a group of T values.
- Serial Correlation - computes the correlation coefficient between adjacent pairs of values.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.