Tuesday, November 10, 2020

Mind the Gap -- mypy's slight lag behind Python 3.9

Working on a new book. Fun stuff. It's going to cover Python 3.9. 

I'm adding the type hints material. And that means PEP 585. Which means type hints for generics can use the generic types. We can use list[int] instead of List[int] and avoid from typing import List.

It's all fun.

Until...

Wait... What?

When I run mypy, it doesn't like the PEP 585 changes. 

I make sure I'm running Python 3.9.0 everywhere. I make sure I have mypy 0.790. I'm baffled. 

Then. I find this.

https://github.com/python/mypy/issues/7907

Ah. It appears that mypy is not completely up-to-speed with Python. 

And this makes perfect sense. 

What I think it means is that I'll have to tweak all of the examples when mypy also supports PEP 585. 

For now, I'm sticking with strict checks and the 0.790 release of mypy.

Tuesday, November 3, 2020

"Python doesn’t do tail recursion" -- wait, what?

Yes, that's what the email said.

I was -- well -- shocked. Too shocked to be polite. Content Warning: much snark follows.

BLUF: Tail Recursion is not Tail Recursion Optimization.

Eventually, it became clear they were worried about tail recursion optimization. Maybe I'm too focused on words, but I think words matter. The initial claim was so clearly wrong, I had to challenge it. It took three *more* emails to get to the optimization point.

Hence my overload of snark. Three emails to discover they didn't see the word "optimization."

Tail Recursion

Here's an example. (I wish questions included example code instead of weird assertions that are clearly false.)

def f(x: int) -> int:
    if x == 0: return 1
    return x*f(x-1)

This function demonstrates tail recursion. The "tail" of the function involves a recursive reference to the function. An optimizing compiler can optimize the recursion to make it not recursive.

If Python doesn't do tail recursion, this function would not work.

Since it works, Python does tail recursion.

Python limits recursion so that you crash cleanly before you're out of memory. 

This is important. Older languages would run out of memory for stack frames. Instead of reporting a recursion problem, they'd just crash. Out-of-memory. Sorry. No drop to `pdb`. Nothing. This is bad. 

In the old Mac OS (≤9) the stack and heap memory grew from opposite ends of the available RAM. If they collided, it was a stack overflow, and the running app was crashed.

Here's how the stack limitation plays out in practice:

Python 3.9.0 | packaged by conda-forge | (default, Oct 10 2020, 20:36:05) 
[Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def f(x: int) -> int:
...     if x == 0: return 1
...     return x*f(x-1)
... 
>>> f(999)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in f
  File "<stdin>", line 3, in f
  File "<stdin>", line 3, in f
  [Previous line repeated 995 more times]
  File "<stdin>", line 2, in f
RecursionError: maximum recursion depth exceeded in comparison
>>> f(997)
403597244616453902342926527652907402110903352461393891430307973735196631901068423726375883385358710213700663198466197719394411318126551961682447808198924968051643330792790545975658652366984953410102994729193397927862707860663312211428139657339199210288839829245965893084586772188847949801354437616450752245066665598898009417557796737695167521343249479413631414534202184726421479392615730781173164526393982880263279118925406206180689438683308644696334133955184235540077242451165903811018277198321800315958279899941381566151490917379981054549852483223292752438981198080270888254399197574536460570473430371595872403169486757166154294941258045311241382930836862005052391967478429035362983199050663230586866257612402804942403442331663944341683350732204123565349869446216232111598995678724462182568501131746383857706790400107507266739002631612931124112227909672935742104968533278074796000335855930432060517447195226436187301231195091058916141500005034486568847599649004940677693185232218378659444854645703908824934015144550035704605317977378620311855095356769488892217130200011250491151641531090120083765159221969755314437880209281708574493693840125338722070514029362985801732618715060934298236579096167095859504053310608725711198457200226544350445941157734863407428532431126485686678788466148681975019174010453297639004006819520704463840773528691224455265229985489764356909675383800245969276679872407757924211918488179598530382266647547907226165479802976547896939656888813256826539067915695278878516257396920983511389029563101112325372395464739783143361362879872578550147571168136083391354242735142803988735616917749898060073075542403509536490539404444972668319521415425667918323473675966566332390993259591959049424070380861864682206986463729281557338747466546627859206287571996491606797979064142819469589200812679026561288124087136359830959867034513441434850212864818601504529520195828528045600869420646442863720485414968365312690523835026508545772659712105161137693595262919371358840019473383802028344531181679417716563013501242477291139042422814166369601152223293596957527530934652046662174154235850073391729650007182794396630407081318880947107940245036774649857429379220776637356890211596540009349092255988047909417594778375705723841918167663026277009033939654785671715045122185315730249393616044737902170116980736000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> 

This shows how a large number of operations (around 1,000) exceeds an internal limit. The exception is not out-of-memory, it's too much recursion.

A slightly smaller number (997 in this example) worked.  999 didn't work because it exceeded the threshold.

Manual Tail Recursion Optimization

New word: Optimization. New concept.

If the tail recursion were optimized into a loop, we'd see code that behaves as if we wrote this:

>>> from math import prod
>>> def f1(x: int) -> int:
...     return prod(range(1, x+1))
    

This unwinds the tail recursion into a loop. A generator, range(1, x+1), creates a sequence of values, which are reduced into a product. This doesn't involve recursion or stack frames. Indeed, because it's a generator, it involves very little memory.

And it works for  numbers well over 1,000. Evidence (if any were needed) that tail recursion optimization is *not* being done in Python.

We'll repeat this for those who feel the need to send me crazy-sounding emails.

Automatic Tail Recursion Optimization

There is no automatic transformation from tail recursion to loop in Python.

I'll repeat that for the folks who send me emails claiming tail recursion isn't supported. (Everyone else probably has a pretty good grip on this.)

There is no automatic transformation from tail recursion to loop. 

Tail recursion works. It has a limit.

Manually optimized tail recursion (to create a loop) also works. It doesn't have the same limit.

Stated yet another way: unlike some languages, Python does not optimize for you. You must do the rewrite yourself.

While I would have thought these ideas (tail recursion and tail recursion optimization) were different. I was wrong. Hopefully, this blog post will help folks read *all* the words.  

I'm also pretty sure it's covered in here: https://www.packtpub.com/product/functional-python-programming-second-edition/9781788627061.

Tuesday, October 27, 2020

Python 3.9 Rearranging typing and collections.abc

This is glorious.

There was this sort of awkward shoving match between typing and collections.abc. Both had generic type definitions and it was -- often -- unclear how to use them.

See PEP 585. Now they are all unified into a much happier family. 

And. We wind up writing things like this.

import collections.abc
import typing
import sys

if sys.version_info >= (3, 9):
    BucketCollection = collections.abc.Collection[Sample]
else:
    BucketCollection = typing.Collection[Sample]

Now we can have code that passes tests for 3.8 and 3.9. And at some point we can cut off the 3.8 support and delete the ancient alternative definition.

I'm delighted to be able to move forward with a much simpler future in which collections are in the collections.abc and other, more foundational stuff is in typing.

Tuesday, October 20, 2020

Type Hints to extend built-in structures

Working on revisions to a book. Fun stuff. See https://www.packtpub.com/product/python-3-object-oriented-programming/9781849511261 I may have the privilege of working with Dusty.

I've been using mypy for all my 2nd edition changes, but not in --strict mode. 

I've decided to ramp things up, and switch to strict type checking for all of the examples and case studies.

This lead me to stumble over

class MyThing(dict):
    def some_extra_method(self):
        pass

I absolutely could not get this to work for hours and hours.

I more-or-less gave up on it, until I started a similar example for a later chapter.

class ListWithFeatures(list):
    def feature(self):
        pass

This is almost the same, but, somehow, I understood it better.  As written, it is rejected by mypy. What I meant was this.

class ListWithFeatures(List[MyThing]):
    @overload
    def __init__(self) -> None: ...
    @overload
    def __init__(self, source: Iterable[MyThing]) -> None: ...
    def __init__(self, source: Optional[Iterable[MyThing]]) -> None:
        if source:
	    super().__init__(source)
        else:
            super().__init__()
    def feature(self) -> float:
        return sum(thing.some_extra_method())/len(self)

I don't know why, but this was easier for me to visualize the problem.. It clarified my understanding profoundly.

We don't simply extend list or dict.  We should extend them because list is an alias for List[Any], and when being strict, we need to avoid Any. Aha.

Tuesday, October 13, 2020

Sources of Confusion: a "mea culpa" [Updated]

 I am not patient. I have been dismissed as one who does not suffer fools gladly.

This is a bad attitude, and I absolutely suffer from it. No denials here. I'm aware it limits my ability to help the deeply confused.

My personal failing is not being patient enough to discover the root cause of their confusion.

It's not that I don't care -- I would truly like to help them. It's that I can't keep my mouth shut while they explain their ridiculous chain of invalid assumptions as if things they invented have some tangible reality.

I'm old, and I'm not planning on becoming more empathetic. Instead, I've become less curious about wrong ideas. I find silence is helpful because I don't yell at them as much as I could.

Recently someone tried to tell me that a Python tuple wasn't **really** immutable. 

Yes. 

That's what they said.

A tuple of lists has lists that can be modified. (They did not have the courtesy to provide examples, I've had to add examples based I what I assume they're talking about.)

>>> t = ([], [], [])
>>> t[0].append(42)
>>> t
([42]. [], [])

"See," they said. "Mutable." 

Implicit follow-up: Python is a web of lies and there's no reason for it to be better than a spreadsheet.

I did not ask how the immutability of a tuple was magically transferred to the lists contained within the tuple. 

I did not ask about their infections disease theory of protocol transformation of types. Somehow, when associated with a tuple, the list became tuple-like, and lost a bunch of methods. 

I did not ask if they thought there was some some method-shedding feature where an immutable structure forces other data structures to shed methods.

I did not ask what was supposed to happen to a dictionary, where there's no built-in frozen dictionary.

I did not ask what would happen with a "custom" class (one created in the app, not a built-in collection.)

I did not ask what fantasy land they come from where a tuple of mutable objects would lead to immutability of the contained objects.

I did not ask if it worked the other way, too: was a list of tuples also supposed to freeze up? 

I did not ask if it transferred more than one level deep into the lists inside the tuple.

I should have.

It was an epic failing on my part to not dig into the bizzaro world where the question could arise.

BTW. They had the same complaint about frozen data classes. (Again. They did not have the courtesy to provide code. I'm guessing this is what they meant. My failure for not asking.)

>>> from typing import List
>>> from dataclasses import dataclass
>>> @dataclass(frozen=True)
... class Kidding_Right:
...     one_thing: List[int]
...     another_thing: List[int]
... 
>>> kr = Kidding_Right([], [])
>>> kr.one_thing.append(42)
>>> kr
Kidding_Right(one_thing=[42], another_thing=[])

Triumphant Sneer: "See, frozen is little more than a suggestion. The lists within the data class are *not* frozen." 

Yes. They appeared to be claiming frozen was supposed to apply transitively to the objects within the dataclass.  Appeared. My mistake was failing to ask what they hell they were talking about.

I really couldn't bear to know what caused someone to think this was in any way "confusing" or required "clarification." I didn't want to hear about transitivity and how the data class properties were supposed to infect the underlying objects. 

Their notion was essentially wrong, and wickedly so. I could have asked, but I'm not sure I could have waited patiently through their answer.

Update.

"little more than a suggestion".

Ah.

This is an strange confusion.

A dynamic language (like Python) resolves everything at run-time. It turns out that there are ways to override __getattr__() and __setattr__() to break the frozen setting. Indeed, you can reach into the internal __dict__ object and do real damage to the object.

I guess the consequences of a dynamic language can be confusing if you aren't expecting a dynamic language to actually be dynamic.

Tuesday, October 6, 2020

The Python Podcast __init__

Check out https://www.pythonpodcast.com/steven-lott-learn-to-code-episode-283/. This was a fun conversation on Python and learning.

We didn't talk about my books in detail. Instead, we talked about learning and what it takes to get closer to mastery.

It's a thing I worry about. I suspect other writers worry about it, also. Will the reader take the next steps? Or will they simply think they've got it because the read about it?