Bio and Publications

Tuesday, January 29, 2019

Eager and Lazy Properties

See this


My answer was -- frankly -- vague. Twitter being what it is, I should have written the blog post first and linked to it.

The use case is this.

class X:
    def __init__(self, x):
        self._value = f(x)
    @property
    def value(self):
        return self._value

We've got a property where the returned value is already an instance variable.

I'm not a fan.

This reflects an eager computation strategy. f(x) was computed eagerly and the value made available via a property. One can justify the use of a property to make the value read-only, but... still nope.

There are a lot of alternatives that make more sense to me.

Option 1. We're All Adults Here.

Here's an approach I think is better.

class X:
    def __init__(self, x):
        self.value = f(x)

It's read-only because -- really -- if you change it, you break the class. So don't change it.

This is my favorite. Read-onlyness is sometimes described has a way protect utter idiots from breaking a library they don't seem to understand. Or. It's described as a way to prevent some Evil Genius Programmer (EGP) from doing something intentionally malicious and breaking things.

Bah.

It's Python. They have access to the source. Why mess around breaking things this way?

Option 2. Lazyiness

Here's an approach that hits at the essential feature.

class X:
    def __init__(self, x):
        self.x = x
    @property
    @lru_cache(None)
    def value(self):
        return f(self.x)

This seems to hit at the original intent without an explicit cached variable. Instead the caching is pushed off into another space. (I'm writing a chapter on decorators, so this may be a bit much.)

The idea, though, is to make properties lazy. If they're expensive, then the result should be cached.

There may be other choices, but I think lazy and eager cover the bases. I don't think eager is wrong, but I don't see the need for a property to hide the attribute created by an eager computation.


Things that start badly

Today's Example of Starting Badly: Building HTML.

The code has a super-simple email message with f"<html><body><p>stuff {data}</p></body></html>". It was jammed into an email object along with the text version. All very nice.

For a moment, I considered suggesting that f-string substitution wasn't a good long-term solution, since it doesn't cover anything more than the most trivial case.

Two things stopped me from complaining:
  • The case really was trivial.
  • It's administrative code: it sends naggy reminder emails periodically. Why over-engineer it?
What an idiot I was.

Today, the {data} has been replaced with a complex table instead of a summary. (Why? The user story evolved. And we needed to replace the summary with details.)

The engineer was pretty sure they could use htmlify(data) or data.htmlify() to transform the data into an HTML structure without seriously breaking the f-string nature of the app.

I should have commented "Don't build HTML that way, it's a bad way to start" on the previous release.

The f-string solution turns rapidly into complexities layered on complexities dusted over the top with sprinkles of NOPE.

This is a job for Jinja2 or Mako or something similar. 

There's a step function change in the app's perceived "complexity". Instead of a simple f-string, we now have to populate a template. It goes from one line of code to more than one (three seems typical.) And. The file-system loader for templates seems more appropriate rather than hard-coding the template in the body of the code. So there are now more files in the app with the HTML templates in them.

However. The Jinja {{variable|round(2)}} was an immediate victory. The use of {%for%} to build the HTML table was the goal, and that simplification was worth the price of entry. Now we're arguing over CSS nuances to get the columns to look "right."

Lessons learned.

Don't let the currently superficial trivial case slide past without at least a warning. Make the suggestion that functions like "get template" and "populate template" will be necessary even for trivial f-string or string.Template processing.

HTML isn't a first-class part of anything. It's external serialization.  Yes, it's for people, but it's only serialization. Serialization has to be separated from the other aspects of the data gathering, map-reduce summarization, and email distribution. There's a pipeline of steps there and the final app should reflect the complete separation of these concerns. Even if it is admin overhead.

Tuesday, January 22, 2019

Read this: 10 Reasons to Learn Python in 2019

See 10 Reasons to Learn Python in 2019.

I want to dwell on #4 for a moment.

The Python community actually has a Code of Conduct. We try to stick by it and conferences will have reporting mechanisms in place so we can educate folks who are being inconsiderate or disrespectful.

The consequence of this is a welcoming and intentionally helpful community. It's hard to emphasize the "intentionally helpful" enough. We don't have much patience for snark. And we're willing to call each other out on being unhelpful.

In my Day Job, we have an in-house Slack channel with well over 1,000 Python folks. The single most common class of questions is a variation on "My Corporate Firewall Setup Doesn't Let Me Use PIP." This is ubiquitous. And confusing. And frustrating for folks who are surprised there is a corporate firewall.

We have a number of pinned answers in Slack for this. And -- perhaps once a week -- someone will patiently repeated the pinned answers for someone who's truly and deeply in over their head trying to get pip to work. (We have an in-house PyPI, also, but it requires doing something in addition to typing `pip install whatever` at the command line, and that can require hand-holding.)

As Python2 winds to a close, and we uncover folks working with Python 2, we have to issue guidance. I've switched tone from "please consider rewriting your app/tool/framework" to "we strongly recommend you start using Python 3." In June, I'm plan to switch to "You have only six months to convert whatever you're working on."

We've had some sidebar conversations on making sure I'm being properly positive, supportive, considerate, and respectful of the folks who think Python2 might be useful.

The point of the Python community is to help each other. We're actively and intentionally trying to be helpful and inclusive.

Technical Sidebar -- Conda and Virtual Environments 

What about the trickle of people trying to make use of the built-in Python2 in Mac OS X or the Python2 that comes with Linux on a cloud-based server?

Some important coaching: Don't Use The OS Default Python.

This is kind of negative. It helps to state this positively. Always Use A Virtual Environment.

Because we have a *large* community of data scientists, this becomes: Always Use A Conda Environment.

And yes, there are some packages that also require a pip install. And yes, XKCD 1987 describes the consequence of the rapid growth of Python and the variety of ways it can be made to work. (While all the strands of spaghetti look like a negative, they reflect the variety of clever solutions, all of which work without any problems.)

Therefore. This.

1. conda create --name=working python=3.7 --file conda_install.txt
2. conda activate working
3. python3 -m pip install pip_install.txt

The absolute worst case is a project with two lists of requirements, one in a conda conda_install.txt and some extra stuff in a pip conda_install.txt. We're able to use `python3 -m pip install requirements.txt` for almost everything.

If you're just starting out, you can use miniconda to bootstrap everything else you might need.

Tuesday, January 15, 2019

Super-picky Writing Advice

There are patterns to bad writing. I'll give some examples based on a blog post I was sent. It's also based -- indirectly -- on some of proposals I saw for PyCon and PyDataDC.

For the conference calls for papers, I can ask a few questions of the author, but that's about it.

For the blog post, I suggested a bunch of changes.

They balked.

Why ask for advice and then refuse to do anything?  (We can conjecture they wanted a "good job" pat on the head. They didn't want to actually have me give them a list of errors to fix.)

One of the points of contention was "Not everyone has your depth of expertise."

Sigh.

The blog post was on Ubuntu admin: something I know approximately nothing about. Let me step away from being an expert, while sticking closely with being able to write. I work with editors who -- similarly -- can write without being deep technology experts. I'm trying to learn what and how they do it.

In this case, my editing was based on general patterns of weak writing.

  1. Contradictions
  2. Redundancy.
  3. Waffling.
  4. Special-Casing.

Any blog post on Ubuntu admin that starts with "This is not just another blog post..." has started off with a flat-out contradiction. It emphatically is another blog post. You can't rise above the background noise of blog posts by writing a blog post that claims it's not a blog post. Sheesh. Find a better hook.

Any blog post on Ubuntu admin that includes "This blog post assumes the reader is familiar with linux sys administration." As if -- somehow -- a reader interested in Ubuntu admin could be confused by the required skills. It's clearly redundant. Cut it.

(This led to an immense back-and-forth with repeated insistence that *somehow* someone once got confused and *something* bad happened to someone. Once. My response was adamant. "It's redundant. The title says Ubuntu. That covered it. More repetition is an insult to the reader.")

Anything that has "this may nor may not work depending on your filesystem" is flat-out confusing: it covers both bases. Does it work or does it not work? Which is it? Clearly, there's some kind of precondition -- "must be this file system" or "most not be this file system" -- buried under "may or may not work." It's not that I know anything about Ubuntu file systems. But I can spot waffling.

Indeed, when you look at it, this is a "hook" to make the blog post useful and interesting. Some advice doesn't work. This advice always works. Simple statements of fact are better than contradictions and waffling.

Finally, there was a cautionary note that replacing "/swapfile" with "/ swapfile" would brick your OS. Which. Was. Crazy. It's really difficult to arbitrarily introduce spaces into shell commands and still have proper syntax. Sometimes a shell command with rando spaces may have proper syntax and may work. Most commands won't work at all with a rando space added. Try some and see.

What's more important is only one random punctuation error was listed. The special-case nature of this was a tip-off that something was not right with this advice.

What about a random ">" in the command? Or a random "|"? Not covered. A single space was considered worthy of mention. The rest. Meh.

(If you want details, it's was `dd -of=/swapfile ...`. If asked, I'd guess `dd -of=/ swapfile` is a syntax error because `swapfile` isn't a valid operand to `dd`. Not an expert. I didn't check. AFAIK, the author didn't actually check, either.)

None of this editing was based on any vast expertise. It was simple editorial work looking for some common problems with hasty writing.

  1. Avoid clear contractions.
  2. Avoid redundancy.
  3. Don't waffle.
  4. Special cases are instances of a more general pattern. The pattern is more important than the case.

One of the contentious back-and-forth issues was "I'm not writing a book, it's only a blog post."

Wait. What? Blog posts are often more widely-read than books. Look at Stack Overflow. If one of my books had the kind of readership my Stack Overflow answers have, I'd be living on royalty payments alone.

A blog post requires the same depth of care as a book.

This applies to proposed talks at conferences, also. The proposal needs to be carefully edited to reflect the final presentation's care and quality.

Thank goodness, most of the 100's of proposals I've looked at rarely have the four obvious problems listed above.

The problems I see in conference proposals are minor.

  1. Incompleteness. A 45 minute talk boiled down to 4 bullet points doesn't give us any confidence. It's hard to imagine filling the whole time with useful content when looking at a four-sentence outline. Will it be rambling digressions? Or will we have disgruntled attendees who had hoped for more?
  2. Weirdly cute style. Things like "This is where I jokingly outline something something and the real fun begins." We assume everyone is witty and charming, you don't need to tell us. We assume all talks will be fun. Can we move on to the Python (or PyData) topics you'll cover?
  3. Sales Pitches. "[Speaker's name] is a respected industry expert who delivers exciting and transformative keynote addresses and will dynamically cover the state-of-the-art blah blah blah..." Please. What Python topic is this? We like to review the outlines without speaker information; we need to focus on the content. Subverting this by including the speaker's name in the description or outline is irritating.
I'm really pleased we see very few PyCon Code-of-Conduct problems in the calls for proposals. That is a delight.

Editing is hard work. I'm sorry to report that editing means making changes. If you ask for editorial advice, it helps to listen.

Tuesday, January 8, 2019

NumFocus and PyData


Videos are up from PyDataDC 2018.  See all the PyData talks at PyDataTV on YouTube.

Tuesday, January 1, 2019

PyCon Call for Proposals

PyCon CFP closes in a day or so. See https://us.pycon.org/2019/speaking/ for details.

After looking at 100's of proposals there are several broad categories.

  • Some cool stuff I wrote (or helped write, or contributed to)
  • Some cool aspect of Python the language
  • Some cool thing happening in the community

These are -- of course -- refined into separate tracks by the program committee. But as a reader of outlines, there are a few BIG buckets I tend to throw things into.

There are some tangentially Python things.

  • Using k8s (or some other tool or framework) with Python. The focus is often the tool, not Python.
  • Doing some machine learning with Python. The focus is often on the ML application domain or the cool model that got produced. Again, Python sometimes feels secondary.
  • Optimizing a personal (or enterprise) workflow or CI/CD pipeline. This can be helpful. But it can sometimes skip past Python and focus on IDE or something not solidly Python.
One of the most challenging are the personal journeys. These are often interesting stories. We like to see how others have succeeded. But. (And this is important.) Some of these cool personal journeys are only tangentially Pythonic, which can be disappointing. Some people can make good things happen in spite of big obstacles.

A personal journey can happen with a lot of technologies. When the proposal doesn't seem to show how Python was a unique enabler of the journey, then it's a cool talk for SXSW or OSCON, but might not be ideal for PyCon.

It's an honor to see everyone's ideas streaming through my in-box.