Tuesday, June 23, 2015

Literate Programming and GitHub

I remain captivated by the ideals of Literate Programming. My fork of PyLit (https://github.com/slott56/PyLit-3) coupled with Sphinx seems to handle LP programming in a very elegant way.

It works like this.
  1. Write RST files describing the problem and the solution. This includes the actual implementation code. And everything else that's relevant. 
  2. Run PyLit3 to build final Python code from the RST documentation. This should include the setup.py so that it can be installed properly. 
  3. Run Sphinx to build pretty HTML pages (and LaTeX) from the RST documentation.
I often include the unit tests along with the sphinx build so that I'm sure that things are working.

The challenge is final presentation of the whole package.

The HTML can be easy to publish, but it can't (trivially) be used to recover the code. We have to upload two separate and distinct things. (We could use BeautifulSoup to recover RST from HTML and then PyLit to rebuild the code. But that sounds crazy.)

The RST is easy to publish, but hard to read and it requires a pass with PyLit to emit the code and then another pass with Sphinx to produce the HTML. A single upload doesn't work well.

If we publish only the Python code we've defeated the point of literate programming. Even if we focus on the Python, we need to do a separate upload of HTML to providing the supporting documentation.

After working with this for a while, I've found that it's simplest to have one source and several targets. I use RST ⇒ (.py, .html, .tex). This encourages me to write documentation first. I often fail, and have blocks of code with tiny summaries and non-existent explanations.

PyLit allows one to use .py ⇒ .rst ⇒ .html, .tex. I've messed with this a bit and don't like it as much. Code first leaves the documentation as a kind of afterthought.

How can we publish simply and cleanly: without separate uploads?

Enter GitHub and gh-pages.

See the "sphinxdoc-test" project for an example. Also this https://github.com/daler/sphinxdoc-test. The bulk of this is useful advice on a way to create the gh-pages branch from your RST source via Sphinx and some GitHub commands.

Following this line of thinking, we almost have the case for three branches in a LP project.
  1. The "master" branch with the RST source. And nothing more.
  2. The "code" branch with the generated Python code created by PyLit.
  3. The "gh-pages" branch with the generated HTML created by Sphinx.
I think I like this.

We need three top-level directories. One has RST source. A build script would run PyLit to populate the (separate) directory for the code branch. The build script would also run Sphinx to populate a third top-level directory for the gh-pages branch.

The downside of this shows up when you need to create a branch for a separate effort. You have a "some-major-change" branch to master. Where's the code? Where's the doco? You don't want to commit either of those derived work products until you merge the "some-major-change" back into master.

GitHub Literate Programming

There are many LP projects on GitHub. There are perhaps a dozen which focus on publishing with the Github-flavored Markdown as the source language. Because Markdown is about as easy to parse as RST, the tooling is simple. Because Markdown lacks semantic richness, I'm not switching.

I've found that semantically rich markup is essential. This is a key feature of RST. It's carried forward by Sphinx to create very sophisticated markup. Think :code:`sample` vs. :py:func:`sample` vs. :py:mod:`sample` vs. :py:exc:`sample`. The final typesetting may be similar, but they are clearly semantically distinct and create separate index entries.

A focus on Markdown seems to be a limitation. It's encouraging to see folks experiment with literate programming using Markdown and GitHub. Perhaps other folks will look at more sophisticated markup languages like RST.

Previous Exercises

See https://sourceforge.net/projects/stingrayreader/ for a seriously large literate programming effort. The HTML is also hosted at SourceForge: http://stingrayreader.sourceforge.net/index.html.

This project is awkward because -- well -- I have to do a separate FTP upload of the finished pages after a change. It's done with a script, not a simple "git push." SourceForge has a GitHub repository. https://sourceforge.net/p/stingrayreader/code/ci/master/tree/. But. SourceForge doesn't use  GitHub.com's UI, so it's not clear if it supports the gh-pages feature. I assume it doesn't, but, maybe it does. (I can't even login to SourceForge with Safari... I should really stop using SourceForge and switch to GitHub.)

See https://github.com/slott56/HamCalc-2.1 for another complex, LP effort. This predates my dim understanding of the gh-pages branch, so it's got HTML (in doc/build/html), but it doesn't show it elegantly.

I'm still not sure this three-branch Literate Programming approach is sensible. My first step should probably be to rearrange the PyLit3 project into this three-branch structure.

Tuesday, June 16, 2015

A plea to avoid sys.exit() [Updated]

Let me gripe about this for a moment.


The use case for this function is limited. Very, very limited.

Every place that this appears (except for one) is going to lead to reusability issues.

Consider some obscure little function, deep within the app.

def deep_within_the_app(x, y, zed):
        something -- doesn't matter what
    except SomeException:
        logging.exception( "deep_within_the_app")

What's so bad about that?

The function seizes control of every app that uses it by raising an unexpected exception.

We can (partially) undo this mischief by wrapping the function in a try/except which catches SystemExit.

def reusing_a_feature():
    for i in range(a_bunch):
        except SystemExit as e:
            print("error on {0}".format(i))

This will defeat the sys.exit(). But the cost is one of clarity. Why SystemExit? Why not some meaningful exception?

This is important: raise the meaningful exception instead of exit.

Bottom Line.

The right place for sys.exit() is inside the if __name__ == "__main__": section.
It might look something like this:

if __name__ == "__main__":
    except (KnownException, AnotherException) as ex:

Use meaningful exceptions instead of sys.exit().

This permits reuse of everything without a mysterious SystemExit causing confusion.

On "Taste" in Software Design

Read this: http://www.paulgraham.com/taste.html.

I was originally focused on "beauty". Clearly, good design is beautiful. Isn't that obvious? Why so many words to explain the obvious?

The post seemed useless. Why write it in the first place? Why share it? Why share it now, 12 years after it was written?

Because beauty can be elusive to some people. A more complete definition of some attributes of beauty are helpful.

This is not a throw-away concept. These are fourteen essential elements that need to be used as part of every software architectural design review. Indeed, it should be part of every code review. Although code perhaps shouldn't be "daring."

When we adopt an architecture, it should fit these criteria.

This doesn't replace more pragmatic software quality assurance considerations.  See http://www.sei.cmu.edu/reports/95tr021.pdf.

I'm currently delighted with "Good design is redesign."

Tuesday, June 9, 2015

On Waiting to Write "Serious Code"

Someone told me they weren't yet ready to write "serious code." They needed to spend more time doing something that's not coding.

I'm unclear on what they were doing. It appears they have some barriers that I can't see.

They had sample data. They had a problem statement. They had an existing solution that was not very good. I couldn't see any reason for waiting. Indeed, I can't figure out what "serious" code is. Does that mean there's frivolous code?

Because there was a previous solution, they had a minimum viable product already defined: it has to do what the previous version did, only be better in some way. One could trivially transform the previous product into unit test cases and an acceptance test case. Few things could be more amenable to coding than having test cases.

Since everything necessary seemed to be in place, I had a complete brain cramp when they mentioned they weren't yet ready to write "serious" code. "Serious?" Seriously?

It appears that this developer suffers from a bad case of Fear of Code™. I know some common sources of this fear.
  1. Waterfall Project Experience (WPE™.) Old people (like me,) who started in Waterfall World, were told that we had to produce mountains of design before we produced any code. No one knew why in any precise way. Indeed, there's ample evidence that too much design is simply a way to introduce noise into the process. In spite of real questions, some folks think that you can write a design so detailed that a coder can just type in the code from the design. (This level of design is isomorphic to code; to avoid ambiguity it must be written as code.)
  2. Relational Database Hegemony (RDH™.) Folks (like me,) who were DBA's, know that databases require a lot of design and a lot of review before they can be created. Writing stored procedures requires even more design and review time. You don't just slap an SP out there. It might be "bad" or "create problems." Also, when you insist on DBA's writing application code, it takes super-detailed, code-level design details. In effect, you must write the code for the DBA to write your code back to you. 
  3. One and Done (OAD™.) Some people like to feel that they can write code once and it can be a thing of beauty and a joy forever. The idea of a rewrite is anathema to these people. While this is obviously silly, people still like the conceit that they can produce some prototype code that will be a proper part of every future release forever and always. It's not possible to make all of the decisions the first time regarding adoption and scaling and user preferences. Your prototype code will get replaced eventually: get over it. Write the prototype, get funding, move forward. Don't dither trying to make a bunch of future-oriented decisions based on a future you cannot actually foresee. You can't "future-proof" your code.
  4. Learnings are Expensive (LAE™.) You can find people that think that the sequence of (spike, POC, version 0, version 1) is too expensive. They are sure that learning is a project drag, since no "tangible" results are created by learning. This means that they don't value intellectual property or knowledge work, either; an attitude is actually destructive to the organization. Knowledge is everything: software captures knowledge: a spike followed by a POC followed by version zero will arrive on the scene more quickly than any alternative strategy. Don't waste time trying to write version 1 from a position of ignorance.
  5. Tools are Expensive (TAE™.) Some people feel that -- since tools are expensive -- they should be used rarely. Back in the olden days, when a compiler took many minutes to produce an error report, you had to be sure the code was good. (I'm old enough that I remember when compiles took hours. Really.) Those days are gone. Most compilers today work at the "speed of light" -- if they were any faster, you couldn't tell, because you can't click any faster. For dynamic languages, like Python, the speed with which code can be emitted makes all tool considerations quaint and silly.
  6. Diagram it to Death (DTD™.) Rather than write code, some folks would rather talk about writing code. To them, email, powerpoint, and whiteboard are cheaper than coding. This is a false economy. Nothing is saved by avoiding code. Time is wasted drawing diagrams of things at a level of detail that mirrors the code. Pictures aren't bad in general. Detailed pictures are simply a stalling tactic.
I find it frustrating when people search for excuses to avoid simply creating code. While I see a number of sources, there are many counter-arguments available. 
  1. Waterfall is dead. Make something minimal that works for this sprint. Call it a "spike" if that makes you happier. Clean it up in the next sprint. Create value early. Expand on the features later.
  2. Databases are free now. SQLite and similar products mean that we can prototype a database without waiting around for DBA's to give us permission to make progress. Build the database now, get something that works. Rework the database as your understanding of the problem matures. Rework the database as the problem itself matures and morphs. Nothing is static; the universe is expanding; do something now.
  3. No code lasts forever. Waiting around to create some kind of perfect value one time only is perfect silliness. Create value early and often. Discarding code means you're making progress. If you think it's important, write "draft" on every electronic document which might get changed. (Hint: version numbers are smarter than putting "draft" everywhere.)
  4. A spike and code happens more quickly than code. It's a matter of technical risk: unfinished work is an "exposure" -- an unrealized investment. Failing soon is better than researching extensively in an effort prevent a failure that could have been found quickly.
  5. Use a dynamic language and avoid all overheads.
  6. Keep the diagrams high-level. Code is the only way to meaningfully capture details. Code endures better than some out-of-date Visio file that's in Sharepoint completely disconnected from GitHub.
It's imperative to break down the roadblocks. All "pre-coding" activities are little more than emotional props: knock them down and start coding.

Tuesday, June 2, 2015

On Pre-built Binaries for Python Packages


Why I Hate Windows.

For Mac OS X, you download XCode (for free) and you can build anything. For Linux, you use some kind of yum or rpm installer for the developer tools, and you can build anything.

For Windows...

Pre-built binaries. 😂

And a hope that the version numbers all match up properly. 😖

In many cases, you can use http://www.mingw.org or http://www.cygwin.com. Many projects can work well with one or both of these compilers.

In some cases, however, you have for fork over $$$ for Microsoft Visual Studio to download and build a Python module with a C extension.

The problem is a show-stopper for many n00bz. They are lead to believe that pip does everything. And it does -- for Mac OS X and Linux; for Windows, however, it does almost everything. And it's not obvious to the n00bz what the problem is when pip barfs because there's no suitable C compiler.

"Replace that junk Windows PC" is not an appropriate response. Although I often suggest it as the first solution when things won't install. 😡

Often Anaconda is the solution. It includes MinGW and you can (for a fee) buy their bundle of database drivers. The install for Anaconda is breathtakingly simple, removing a great deal of the potential complexity of assembling a tech stack for Python.

In other cases, we have to do some hand-holding to show how to find a pre-built binary for Windows.