Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label restructuredtext. Show all posts
Showing posts with label restructuredtext. Show all posts

Tuesday, August 9, 2016

That Feeling When... You're reading your own documentation because it's useful and (mostly) correct

I'm looking at code (as a man does) and I can't remember if there's a class that does X. There's a lot of code. I wrote almost all of it. And -- maybe it's the gin -- but I just can't recall if there's an X. It seems like there should be.

Scan. Scan. Scroll. Scroll.

Read. Read.

Wait!

I have a pretty good gh-pages branch for this. Sphinx-based. Mostly up-to-date. Let's look there.

Ahhh. So much nicer than scrolling through code. Indexes work.

This whole "documentation" thing is pretty cool. Now I'm actually happy that other people guilted me into doing it.

Monday, April 5, 2010

Getting Started Creating Web Pages

Got this question recently.
I’m looking for an HTML editor that fits into my price range (free of course). I don’t need to do anything fancy, just vanilla HTML to run on an Apache server ..., and maybe some PHP down the line. Can you recommend any open source or shareware software that would run on Windows?
What to do?

First, civilized folks don’t edit HTML any more. That’s so 1999.

You have a spectrum of choices if you want to try and edit HTML.
  • General-purpose text editors. Good ones do HTML syntax coloring. This is the hardy, forge-through-the-forest way to go. Raw text editing. Like when we were kids. http://en.wikipedia.org/wiki/List_of_text_editors. In Windows world, I use Notepad++.
  • HTML-specific editors. http://en.wikipedia.org/wiki/List_of_HTML_editors. Note that WYSIWYG HTML Editing is more trouble than you’d believe possible. It’s always fun for the first few months, but then you try to do something that confuses the GUI interface and you wind up with an entire paragraph in italics and can’t figure out why. Or you want to move a punctuation outside a link and discover that the editor just can’t figure out where the tag is supposed to fall and puts everything inside it. Most of us do not try to use WYSIWYG HTML editors because it slowly becomes annoying once you get beyond the trivial basics.
  • IDE’s. To produce HTML sensibly, you have to also write .CSS style sheets, and you often have a number of related pages. Essentially, a “project”. An IDE is usually a better choice than an editor. All the good IDE’s are free: Eclipse, NetBeans and Komodo Edit. I use ActiveState Komodo Edit heavily.
While NetBeans or Komodo Edit seems like overkill, it will (eventually) pay out as you move into developing more than static HTML pages.

Better Than HTML

Instead of creating HTML, many of us use “Lightweight Markup” which is much, much easier to cope with and simple tools to produce HTML from the markup. http://en.wikipedia.org/wiki/Lightweight_markup_language

I use reStructuredText instead of HTML. I use the DocUtils project, which has an rst2html.py tool that converts my RST into HTML for me. I also use rst2s5.py to create power-point-like presentations from my reStructuredText. If you want to see the power of RST, you can look at my personal site and my books: and. 100% RST. No manual HTML anywhere. I use Sphinx to create really complex docments like the books.

For some tasks, I use HTML templates and simple scripts to process data and create static HTML from the data. You’d be surprised how effective this is. Few things require up-to-the-second web applications. Many things can be done as nightly batch programs that emit static HTML and FTP the HTML up to the web page. No PHP.

Application Development

For web development, PHP is fine. It will – before long – create holes in your head because it’s so badly thought out. But for getting started, it’s fun. Real companies (like Google) don’t waste their time with it because of the numerous problems PHP causes.

“Problems?” you say. “What problems?”

PHP’s world view (HTML + code in a single package) is a terrible architecture. It’s horribly slow and leads to very muddled, inflexible designs. Everyone who tries to make a global change to their site's “look and feel” finds that PHP is inflexible and a regrettable platform. Even folks who simply want consistency among several different pages within their site find that the PHP world view is more headache than solution.

But it’s fun when you first build a site that works.

Frameworks

Generally, most folks find that a “framework” is absolutely essential for debugging, consistency and separating Content, Processing and Presentation. Even a simple Blog or Forum or Visitor Registration has separate Content, Processing and Presentation; PHP muddles these. A framework can help unmuddle them.

I use Django as framework and Python as programming language. Your hosting site may not support this, in which case you may be in trouble.

The Web Frameworks list on Wikipedia is good. Zend and CodeIgniter are highly recommended in places like StackOverflow. However, here's a good Django vs. PHP comparison: The Onion Uses Django, And Why It Matters To Us.

"Because
Cleaner. Much cleaner. Proper unit testing. Real reusable components across applications. An ORM rather than a just a series of functional query helpers...."

Summary
  1. Get an IDE to edit your pages. Komodo Edit.
  2. Consider using RST and tools instead of raw HTML. Installing Python + DocUtils and using rst2html.py is easier than learning HTML.
  3. Try to avoid PHP’s numerous pitfalls; ideally by avoiding PHP. Use Django + Python and create a real application that clearly separates the content (data model) from processing (view functions) from presentation (HTML templates)

Wednesday, June 24, 2009

Semantic Markup -- RST vs. XML

I have very mixed feelings about XML's usability.

An avowed goal of the inventors of XML was "XML documents should be human-legible and reasonably clear." While I like to think that "legible" means usable, I'm feeling that legibility is really a minimal standard; I think it's a polite way of saying "viewable with any text editor."

I've got some content (my Building Skills books) that I've edited with a number of tools. As I've changed tools, I've come to really understand what semantic markup means.

Once Upon A Time

When I started -- back in '00 or '01 -- I was taking notes on Python using BBEdit and other text-editor tools. That doesn't really count.

The first drafts of the Python book were written using AppleWorks; the predecessor to Apple's iWork Pages product. Any Mac text editor is a joy to use. Except, of course, that AppleWorks semantic markup wasn't the easiest thing to use. It was little more than the visual styles with meaningful names.

Then I converted the whole thing to XML.

DocBook Semantic Markup

The DocBook XML-based markup seemed to be the best choice for what I was doing. It was reasonably technically focused, and provided a degree of structure and formality.

To convert from AppleWorks, I exported the entire thing as text and then used the LEO Outlining Editor to painstakingly -- manually -- rework it into XML.

At this point, the XML tags were a visible part of the document, and editing the document means touching the tags. Not the easiest thing to do.

I switched to XMLmind's XXE. This was nice -- in a way. I didn't have to see the XML tags, but I was heavily constrained by the clunky way they handle the XML document structure. Double-clicking a word can lead to ambiguity on which level of tag you wanted to talk about.

The XML was "invisble" but the many-layered hierarchical structure was very much in my face.

RST Semantic Markup

After becoming a heavy user of Sphinx, I realized that I might be able to simplify my life by switching from XML to RST.

There are a number of gains when moving to RST.
  1. The document is simpler. It's approximately plain text, with a number of simple constraints.
  2. Editing is easier because the markup is both explicit and simple.
  3. The tooling is simpler. Sphinx pretty much does what I want with respect to publication.
There is just one big loss: semantic markup. DocBook documents are full of <acronym>TLA</acronym> to provide some meaningful classification behind the various words. It's relatively easy to replace these with RST's Interpreted Text Roles. The revised markup is :acronym:`TLA`.

The smaller, less relevant loss, is the inability to nest inline markup. I used nested markup to provide detailed <function><parameter>a</parameter></function> kind of descriptions. I think :code:`function(x)` is just as meaningful when it comes to analyzing and manipulating the XML with automated tools.

The Complete Set of Roles

I haven't finished the XML -> Sphinx transformation. However, I do have a list of roles that I'm working with.

Here's the list of literal conversions. Some of these have obvious Sphinx/RST replacements. Some don't. I haven't defined CSS markup styles for all of these -- but I could. Instead, I used the existing roles for presentation.

.. role:: parameter(literal)
.. role:: replaceable(literal)
.. role:: function(literal)
.. role:: exceptionname(literal)
.. role:: classname(literal)
.. role:: methodname(literal)
.. role:: varname(literal)
.. role:: envar(literal)
.. role:: filename(literal)
.. role:: code(literal)

.. role:: prompt(literal)
.. role:: userinput(literal)
.. role:: computeroutput(literal)

.. role:: guimenu(strong)
.. role:: guisubmenu(strong)
.. role:: guimenuitem(strong)
.. role:: guibutton(strong)
.. role:: guilabel(strong)
.. role:: keycap(strong)

.. role:: application(strong)
.. role:: command(strong)
.. role:: productname(strong)

.. role:: firstterm(emphasis)
.. role:: foreignphrase(emphasis)
.. role:: attribution
.. role:: abbrev

The next big step is to handle roles that are more than a simple style difference. My benchmark is the :trademark: role.

Adding A Role

Here's what you do to add semantic markup role to your document processing tool stack.

First, write a small module to define the role.

Second, update Sphinx's conf.py to name your module. It goes in the extensions list.

Here's my module to define the trademark role.

import docutils.nodes
from docutils.parsers.rst import roles

def trademark_role(role, rawtext, text, lineno, inliner,
options={}, content=[]):
"""Build text followed by inline substitution '|trade|'
"""
roles.set_classes(options)
word= docutils.nodes.Text( text, rawtext )
symbol= docutils.nodes.substitution_reference( '|trade|', 'trade', refname='trade' )
return [word,symbol], []

def setup( app ):
app.add_role( "trademark", trademark_role )

Here's the tweak I made to my conf.py

import sys, os
project=os.path.join( "")
sys.path.append("/Users/slott/Documents/Writing/NonProg2.5/source")
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.ifconfig', 'docbook_roles' ]

That's it. Now I have semantic markup that produces additional text (in this case the TM symbol). I don't think there are too many more examples like this. I'm still weeks away from finishing the conversion (and validating all the code samples again.)

But I think I've preserved the semantic content of my document in a simpler, easier to use set of tools.

Monday, May 25, 2009

ReStructured Text markup and Content Management

I can't say enough good things about ReStructuredText (RST).  I've used all of the available markup languages (SGML, HTML and XML).  They have their place, but they all fall short of being truly usable.

In This sounds complicated, because it is I reviewed some of my history of cheap content management.   

In looking at content of all kinds, I'm finding that RST is much, much easier to work with than SGML, HTML or XML.  In short, I think that RST makes the file system into a really good content management system (CMS).  Unstructured content is a big win.  Structured content is a "don't care".  But there's a middle ground of semi-structured content that requires sophisticated semantic markup.

SGML At The Dawn Of Time

When the web started it's ascent (back in the 90's), I was lucky.  I had already been working with folks that did military contracting, and folks there had introduced me to SGML.   When I moved from SGML to HTML, I saw it as a pleasant simplification because it had a more-or-less fixed DTD.  

My first personal web pages were lovingly hand-crafted HTML masterpieces.  (Okay, they were lovingly hand-crafted.)   There was  a lot of work involved in markup, cross-references, and presentation. 

HTML via a Class Hierarchy

My first templating was via proper Python classes.  I created class hierarchies that embodied the page template and filled in required data.  The heart of each class was an emit method that wrote the final HTML.

Variant page layouts and special cases were easily handled by Python simple inheritance.  

Of course, the big problem is that HTML is just representation.  There's often some bleed-through between the problem domain model and the HTML representation of that underlying model.  You don't want your problem domain objects to encode any HTML.  You can have a generic Tag class, but the Page class is specific to your problem domain.

The Python class structure is nice, but it's only suitable for structured content management.  When you have semi-structured and unstructured data -- the strong suit of HTML -- you find the class hierarchy to be too rigid.

Some time in the early 00's, I discovered Cheetah.

HTML via Templates

Cheetah (and template engines like Mako, Jinja, and numerous others) did what I wanted.  A base template was -- effectively -- a superclass.  Each block in that template could be overridden by a subclass.

The content, then, becomes a relatively simple template file that extends a page layout.  You can handle unstructured and semi-structured content very nicely.  I changed my ways of working with HTML to leverage this elegant, extensible view of the world.  I redid my personal web site: the content become a collection of Cheetah templates that contained all the content.

Note that I've *added* a markup language.  In addition to HTML, I also have some Cheetah markup on each page.  While this got me consistency and flexibility (and a reduction in the volume of stuff on each page) it did make things slightly more complex.

Look at http://cadesignquilts.com/ for another example of an all-Cheetah static site.  I did several sites like this.  The workflow involved (1) design the overall page, (2) getting the data into a usable form, (3) generating the page-level template files, and (4) running Cheetah to emit HTML from the templates.  All static content.  Runs like lightning.  

The JSP Distraction

Eventually, I started doing development with Struts, which depends heavily on JSP.  You have HTML commingled with Java code.  Plus, you've got custom actions via a tag library to extend JSP processing.  You can create page-level templates with a reasonably smart JSP tag library.

This template solution doesn't work well for unstructured or semi-structured data.  It's a pure programming solution.

DocBook XML and Semantic Markup

I wrote Building Skills in Python entirely in Appleworks.  That was pretty well unmaintainable and unpublishable in that form.

I converted the text to DocBook XML.  I used the Leo outliner to manage the document as a whole.  I wrote my own publishing workflow to transform the XML to HTML and PDF.   It worked reasonably well.

More important, using DocBook reinforced the importance of semantic markup.  It took me back to my SGML days.  It also showed why and other HTML presentation things have to be moved out of the document and into the stylesheet.

This was a very nice way to handle the semi-structured and unstructured content in a book.  Direct use of XML is a pain in the neck.  XML has a lot of syntax.  It's much nicer to do your thinking with something lighter weight.  

ReStructured Text (RST) for Unstructured Content

Somewhere in the late 00's, I found Python's docutils and RST.  I can't figure out when I started -- precisely -- but using RST as part of content management didn't fully click at first.

After reworking my personal site, which includes a lot of really unstructured ("random" might be a better word) content, I'm seeing the value in RST + Filesystem as a CMS.  I think the Sphinx folks are right.  If you have a simple markup system and all the filesystem tools that have evolved over the past few decades, you're covered.

Further, on larger projects, I've found that I can pop out a nice template documentation tree with a simple .. toctree:: directive on the index.rst page and generate a tidy, complete documentation package without much pain.

Structured Content

For structured data, you have ordinary classes and programs.  You have SQL databases, ORM to map to classes; all of that technology.  It's easy to write applications that emit RST which you can then publish.  

Most structured content can be boiled down to tables and charts.  The .. csv-table:: directive makes it easy to have an application emit data that you fold into a more elegant-looking report.

The Nuance -- Semi-Structured Data

My worst-case scenarios are my résumés: sailing, programming and writing.  The data has deep semantic meaning:  it isn't just words.  On the other hand, the data has lots of special-cases and exceptions: it isn't totally amenable to a database.

The absolute best part of docutils is that the parser's output is available for processing.  You can -- easily -- add directives and text roles to create semantic meaning.

I experimented with XML and YAML for my résumés.  The XML is cumbersome.  The YAML requires a fairly sophisticated class model to make use of the information.  

RST with a few text roles, however, rocks.  The .. role:: directive makes it easy to throw roles into a document for later use by applications.