Tuesday, January 15, 2019

Super-picky Writing Advice

There are patterns to bad writing. I'll give some examples based on a blog post I was sent. It's also based -- indirectly -- on some of proposals I saw for PyCon and PyDataDC.

For the conference calls for papers, I can ask a few questions of the author, but that's about it.

For the blog post, I suggested a bunch of changes.

They balked.

Why ask for advice and then refuse to do anything?  (We can conjecture they wanted a "good job" pat on the head. They didn't want to actually have me give them a list of errors to fix.)

One of the points of contention was "Not everyone has your depth of expertise."


The blog post was on Ubuntu admin: something I know approximately nothing about. Let me step away from being an expert, while sticking closely with being able to write. I work with editors who -- similarly -- can write without being deep technology experts. I'm trying to learn what and how they do it.

In this case, my editing was based on general patterns of weak writing.

  1. Contradictions
  2. Redundancy.
  3. Waffling.
  4. Special-Casing.

Any blog post on Ubuntu admin that starts with "This is not just another blog post..." has started off with a flat-out contradiction. It emphatically is another blog post. You can't rise above the background noise of blog posts by writing a blog post that claims it's not a blog post. Sheesh. Find a better hook.

Any blog post on Ubuntu admin that includes "This blog post assumes the reader is familiar with linux sys administration." As if -- somehow -- a reader interested in Ubuntu admin could be confused by the required skills. It's clearly redundant. Cut it.

(This led to an immense back-and-forth with repeated insistence that *somehow* someone once got confused and *something* bad happened to someone. Once. My response was adamant. "It's redundant. The title says Ubuntu. That covered it. More repetition is an insult to the reader.")

Anything that has "this may nor may not work depending on your filesystem" is flat-out confusing: it covers both bases. Does it work or does it not work? Which is it? Clearly, there's some kind of precondition -- "must be this file system" or "most not be this file system" -- buried under "may or may not work." It's not that I know anything about Ubuntu file systems. But I can spot waffling.

Indeed, when you look at it, this is a "hook" to make the blog post useful and interesting. Some advice doesn't work. This advice always works. Simple statements of fact are better than contradictions and waffling.

Finally, there was a cautionary note that replacing "/swapfile" with "/ swapfile" would brick your OS. Which. Was. Crazy. It's really difficult to arbitrarily introduce spaces into shell commands and still have proper syntax. Sometimes a shell command with rando spaces may have proper syntax and may work. Most commands won't work at all with a rando space added. Try some and see.

What's more important is only one random punctuation error was listed. The special-case nature of this was a tip-off that something was not right with this advice.

What about a random ">" in the command? Or a random "|"? Not covered. A single space was considered worthy of mention. The rest. Meh.

(If you want details, it's was `dd -of=/swapfile ...`. If asked, I'd guess `dd -of=/ swapfile` is a syntax error because `swapfile` isn't a valid operand to `dd`. Not an expert. I didn't check. AFAIK, the author didn't actually check, either.)

None of this editing was based on any vast expertise. It was simple editorial work looking for some common problems with hasty writing.

  1. Avoid clear contractions.
  2. Avoid redundancy.
  3. Don't waffle.
  4. Special cases are instances of a more general pattern. The pattern is more important than the case.

One of the contentious back-and-forth issues was "I'm not writing a book, it's only a blog post."

Wait. What? Blog posts are often more widely-read than books. Look at Stack Overflow. If one of my books had the kind of readership my Stack Overflow answers have, I'd be living on royalty payments alone.

A blog post requires the same depth of care as a book.

This applies to proposed talks at conferences, also. The proposal needs to be carefully edited to reflect the final presentation's care and quality.

Thank goodness, most of the 100's of proposals I've looked at rarely have the four obvious problems listed above.

The problems I see in conference proposals are minor.

  1. Incompleteness. A 45 minute talk boiled down to 4 bullet points doesn't give us any confidence. It's hard to imagine filling the whole time with useful content when looking at a four-sentence outline. Will it be rambling digressions? Or will we have disgruntled attendees who had hoped for more?
  2. Weirdly cute style. Things like "This is where I jokingly outline something something and the real fun begins." We assume everyone is witty and charming, you don't need to tell us. We assume all talks will be fun. Can we move on to the Python (or PyData) topics you'll cover?
  3. Sales Pitches. "[Speaker's name] is a respected industry expert who delivers exciting and transformative keynote addresses and will dynamically cover the state-of-the-art blah blah blah..." Please. What Python topic is this? We like to review the outlines without speaker information; we need to focus on the content. Subverting this by including the speaker's name in the description or outline is irritating.
I'm really pleased we see very few PyCon Code-of-Conduct problems in the calls for proposals. That is a delight.

Editing is hard work. I'm sorry to report that editing means making changes. If you ask for editorial advice, it helps to listen.

Tuesday, January 8, 2019

Tuesday, January 1, 2019

PyCon Call for Proposals

PyCon CFP closes in a day or so. See https://us.pycon.org/2019/speaking/ for details.

After looking at 100's of proposals there are several broad categories.

  • Some cool stuff I wrote (or helped write, or contributed to)
  • Some cool aspect of Python the language
  • Some cool thing happening in the community

These are -- of course -- refined into separate tracks by the program committee. But as a reader of outlines, there are a few BIG buckets I tend to throw things into.

There are some tangentially Python things.

  • Using k8s (or some other tool or framework) with Python. The focus is often the tool, not Python.
  • Doing some machine learning with Python. The focus is often on the ML application domain or the cool model that got produced. Again, Python sometimes feels secondary.
  • Optimizing a personal (or enterprise) workflow or CI/CD pipeline. This can be helpful. But it can sometimes skip past Python and focus on IDE or something not solidly Python.
One of the most challenging are the personal journeys. These are often interesting stories. We like to see how others have succeeded. But. (And this is important.) Some of these cool personal journeys are only tangentially Pythonic, which can be disappointing. Some people can make good things happen in spite of big obstacles.

A personal journey can happen with a lot of technologies. When the proposal doesn't seem to show how Python was a unique enabler of the journey, then it's a cool talk for SXSW or OSCON, but might not be ideal for PyCon.

It's an honor to see everyone's ideas streaming through my in-box.

Tuesday, November 13, 2018

Using Python instead of bash

See Bashing the Bash — Replacing Shell Scripts with Python for some concrete examples of stuff you can do in Python or the shell.

And yes, it's a good, workable idea. 

1. It's unit testable.
2. It's easier to read.
3. It may be faster. Not that you'd notice unless you've really made a terrible mistake and written some gigantic application as a shell script.

While you're at it, check out the overall blog: https://medium.com/capital-one-tech. There's a lot going on.

Tuesday, November 6, 2018

PyData 2018 Washington, DC

See https://pydata.org/dc2018/

You do need to get your tickets ASAP. The schedule is fabulous.

Hotel rooms are still available, so don't waste any time getting connected.

Tuesday, October 30, 2018

The SourceForge vs. GitHub Conundrum

Or "When is it time to move?"

I've got https://sourceforge.net/projects/stingrayreader/ which has been on SourceForge since forever. 

Really since about 2014. Not that long. But. Maybe long enough?

The velocity of change is relatively slow.


(And this is a big however.) SourceForge seems kind of complicated when compared with Github. 

It's not a completely fair comparison. SourceForge has a *lot* of features. I don't use very many of those features. 

The troubling issues are these.

1. Documentation. SourceForge -- while it has a Git interface -- doesn't handle my documentation very well. Instead of a docs directory, I do a separate upload of the HTML. It's inelegant. SourceForge may handle this more smoothly nowadays. Or maybe I should switch to readthedocs? 

2. The Literate Programming Workflow. There's an extra step (or two) in LP workflows. The PyLit3 synchronization to create the working Python from the RST source. This is followed by the ubiquitous steps creation of a release, creation of a distribution, and the upload to PyPI. I don't have an elegant handle on this because my velocity of change is so low. SourceForge imposed a "make your own ZIP file" mentality that could be replaced by a nicer "use PyPI" approach.

3. Clunky Design Issue. I've uncovered a clunky, stateful design problem in the StingrayReader. I really really really need to fix it. And while fixing it, why not move to Github?

4. Compatibility Testing. The StingrayReader seems to work with Python 3.5 and up. I don't have a formal Tox suite. I think it works with a number of versions of XLRD. And it *should* be amenable to other tools for Excel processing. Not sure. And (until I start using tox) can't tell. 

5. Type Hints. See #3. The stateful design problem can be finessed into a much more elegant use of NamedTuples. And then mypy can be used.

6. Unit Tests. Currently, the testing is all unittest.TestCase. I really want to convert to pytest and simplify all of it.

7. Lack of a proper workflow in the first place. See #2. It's a more-or-less sitting in the master branch of a git repo that's part of SourceForge. That's kind of shabby. 

8. Version Numbering Vagueness. When I was building my own Zip archives from the code manually (because that's the way SourceForge worked.) I wasn't super careful about semantic versioning, and I've been release patch-number versions for a while. Which is wrong. A few of those versions included new features. Minor, but features. 

But. One tiny new feature. So. It will be release 4.5.

See https://sourceforge.net/p/stingrayreader/blog/2018/10/moving-to-github/ for status, also

Tuesday, October 16, 2018

The Edge of the Envelope

I don't -- generally -- think of myself as an edge-of-the-envelope developer. I'm a tried-and-proven kind of engineer. I want stuff that's been around for years, with a long history of changes.



Currently, I'm revising Mastering Object-Oriented Python. Second Edition.

That means upgrading everything to Python 3.7 with full type hints throughout almost all of the 18 chapters. (SQLAlchemy presents some problems, so we're not going deep there.)

The chapter on foundational WSGI applications is *totally* broken. I can't get anything to work with mypy. (The unit tests run, but mypy complains. Loudly.) Of course, I tried every wrong thing for three solid days. Then I pulled the stub file from typeshed and realized how dumb I was.

Okay. I finally got the correct type hints. Yay!


Something in mypy is balking at the start_response() function calls. Too many arguments.

Read the issues. Hm. Stack Overflow. Hm.

Just to be sure, I updated to the new 0.630 release in September, 2018.

Problem solved. So. I've arrived at the edge of the envelope. I now require the absolutely latest and greatest mypy release. By the time I'm done with the rewrites, this release will be ancient history. But today, it was wonderful to get past the examples.