Tuesday, August 23, 2022

Books! Books! More Channels!

I started with the Apple Books platform because it's an easy default for me. 

Pivot to Python

A Guide for professionals and skilled beginners


I've recently updated this to fix some cosmetic problems with title pages, the table of contents and stuff like that. The content hasn't changed. Yet. It's still an introduction to Python for folks who already know how to program, they want to pivot to programming in Python. Quickly.

But wait, there's more. 

Unlearning SQL

When your only tool is a hammer, every problem looks like a nail


Many folks know some Python, but struggle with the architectural balance between writing bulk processing in SQL or writing it in Python. For too many developers, SQL is effectively the only tool they can use. With a variety of tools, it becomes easier to solve a wider variety of problems effectively.

Google Play

Now, I'm duplicating the books on Google Play. Here's Unlearning SQL:


I've made a clone of Pivot to Python, also.


Both books are (intentionally) short to help experts make rapid progress.

Tuesday, August 16, 2022

Enterprise Python -- Some initial thoughts

In the long run, I think there's a small book here. See 8 reasons Python will rule the enterprise — and 8 reasons it won’t | InfoWorld. The conclusion, "Teams need to migrate slowly into the future, and adopting more Python is a way to do that," seems to be sensible. Some of the cautionary tales along the way, however, don't make as much sense.

TL;DR. There are no reasons to avoid Python. Indeed, the 8 points suggest that Python is perhaps a smart decision. 

I want to focus on the negatives part of this because some of them are wrong. I think there's a "technology hegemony" viewpoint where everything in an enterprise must be exactly the same. This tends to prevent creative solutions to problems and mires an enterprise into fighting problems that are inherent in bad technology choices. Also, I think there's an enterprises are run by idiots subtext.

1. Popularity. Really this is about having polyglot software portfolio. The reasoning appears to be that a polyglot software portfolio is impossible to maintain because (1) no one can learn an old language, and (2) software will never be rewritten from an obsolete language to a modern language. If these are both true, it appears the  organization is full of idiots. The notion that a polyglot tech stack must devolve into chaos seems to ignore the endless chain of management decisions that are required to create chaos. Leaving obsolete tech in place isn't a consequence of the tech, or the tech's lack of compatibility, it's a management decision to enshrine bad ideas, frozen in amber, forever.

2. Scripting Languages. Specifically, the spreadsheet is already the de facto scripting language of choice, and nothing can be done about it. Nothing. No one can learn to use Jupyter Lab to do business analytics. If this is true, it appears that the organization is full of idiots. Python will not replace all spreadsheets. A pandas data frame will replace an opaque macro-filled nightmare with code that can be unit tested. Imagine unit testing a spreadsheet. Consider the possibilities of expanding business analysis work to include a few test cases; not 100% code coverage, but a few test cases to confirm the analytical process was implemented consistently. 

3. Dynamic Languages. Specifically, dynamic languages are useless for reliable software because there's no comprehensive type checking across some interfaces. Which begs the question of why there are software failures in statically typed languages. More importantly, complaining about dynamic languages raises important questions about integration and acceptance testing procedures in an organization in general. All languages require extensive test suites for all developed code. All languages benefit from static analysis. Sometimes the compiler does this, sometimes external tools do the linting. Sometimes folks use both the compiler and linters to check types. If we are sure dynamic languages will break, are we equally sure statically typed languages cannot break? Or, do we take steps to prevent problems? I think we tend to take a lot of steps to make sure software works.

4. Tooling. I can't figure this point out. But somehow C++ or Java have better tools for managing large source code bases. There are no details behind this claim, so I'm left to guess. I would suggest that the "incremental recompilation" problem of large C++ (and Java) code bases is its own nightmare. Folks go to great lengths to architect C++ so that an implementation change does not require recompilation of everything. While this could be seen as "evolved to handle the jobs that enterprise coders need done", I submit that there's a deeper problem here, and stepping away from the compiler is a better solution than complex architectures. See Lakos Large-Scale C++ Software Design for some architectural features that don't solve any enterprise problem, but solve the scaling problem of big C++ applications. This bumps into the micro-services/monolith discussion, and the question of carefully testing each interface. None of which has anything to do with Python specifically.

5. Machine Learning and Data Science. These are fads, apparently. I'm not sure I can respond to this, since it has little to do with Python. Of course, Python has one of the most complete data science toolsets, so perhaps avoiding data science makes it easier to avoid Python.

6. Rapid Growth. The growth of Python is rapid, and there's no promise of endless backwards compatibility. This is a consequence of active development and learning. I think it's better than the endless backwards compatibility that leads to JavaScript's list of WATs. Or the endless confusion between java.util.date and Joda-Time. The idea that no one will ever look at the Enterprise code base for Common Vulnerabilities and Exposures seems to indicate a lack of concern for reliability or security. Since the entire compiled code base has to be checked for vulnerabilities, why not also check the Python code base for ongoing upgrades and changes and enhancements? Is code really written once and never looked at again? If so, it sounds like an organization run by idiots.

7. Python Shipped With Some OS's. There's a long story of woe that stems from relying on the OS-supplied Python. The lesson learned here is Never Rely on the OS Python; Always Install Your Own. This doesn't seem like a reason to avoid Python in the enterprise. It seems like an important lesson learned for all software that's not part of the OS: always install your own. I've been using Miniconda to spin up Python environments and absolutely love it.

8. Open Source Software. Agreed. Nothing to do with Python specifically. Everything to do with tech stack and architecture. The question of using Open Source in the first place doesn't seem difficult. It's a well-established way to reduce start-up costs for software development.

Of the eight points, two seem to be completely generic issues. Yes, Machine Learning is new, and yes, choices must be made. The two questions around scripting and dynamic languages seem specious; all programming requires careful design and testing. The Python shipped with the OS is a non-concern; the lesson learned is clear.

We have have three remaining points:

  1. Polyglot Portfolio (From pursuing popularity.) This is already the case in most Enterprises, and needs to be managed through aggressive retiring of old software. I may have taken decades to build that old app, but it often takes months to rewrite it in a new language. The legacy app provides acceptance test cases; it's often filled with cruft and detritus of old decisions. 
  2. Tooling. Agreed. Tooling is important. Not sure that Java or C++ have a real edge here, but, tooling is important.
  3. Growth and Change. Python's rapid evolution requires active management. An enterprise must adopt a YBYO (You Build it You Own it) attitude so that every level of management is aware of the components they're responsible for. CVE's are checked, Python PEP's are checked. Tools like tox or nox are used to build (and rebuild) virtual environments.
If these seem like a high bar, perhaps there are deeper issues in the enterprise. If adding Yet Another Language is a problem, then it's time to start retiring some languages. If Adding Another Tool is a problem, it's worth examining the existing tool chain to see why it's such a burden. If the idea of change is terrifying, perhaps the ongoing change is not being watched carefully enough.

Tuesday, August 9, 2022

Tragedy Averted

I almost made a terrible blunder.

See https://github.com/slott56/py-web-tool for some background. This is a "Literate Programming" tool. I started fooling around with this kind of thing back in '05 (maybe even earlier.) This is not the blunder. The whole idea of literate programming is not very popular. I'm a fan of Jupyter{Book} as the state of the art in sophisticated literate programming, if you're interested in it.

In my case, I started this project so long ago, I used docutils. This was long before Sphinx arrived on the scene. I never updated my little project to use Sphinx. The point was to have a kind of pure literate programming tool that could work with a variety of markup languages, including (but not limited to) RST.

Recently, I learned about PlantUML. The idea of a text description of a diagram is appealing. I don't really need to draw it; I just need to specify what's in it and let graphviz do the rest. This tool is very, very cool. You can capture ideas quickly. You can refine and expand on ideas until you reach a point where code makes more sense than a picture of code. 

For some things, you can gather data and draw a picture of things *as they are*. This is particularly valuable for cloud-based infrastructure where a few queries leads to PlantUML source that is depicted very nicely.

Which leads to the idea of Literate Programming including UML diagrams. 

Doesn't sound too difficult. I can create an extension to docutils to introduce a UML directive. The resulting RST would look like this:

..  uml::

    left to right direction
    skinparam actorStyle awesome

    actor "Developer" as Dev
    rectangle PyWeb {
        usecase "Tangle Source" as UC_Tangle
        usecase "Weave Document" as UC_Weave
    rectangle IDE {
        usecase "Create WEB" as UC_Create
        usecase "Run Tests" as UC_Test
    Dev --> UC_Tangle
    Dev --> UC_Weave
    Dev --> UC_Create
    Dev --> UC_Test

    UC_Test --> UC_Tangle

This could be handy to have the diagrams as part of the documentation that tangles the working the code. One source for all of it. 

I started down the path of researching docutils extensions. Got pretty far. Far enough that I had an empty repository and everything. I was about ready to start creating spike solutions.


[music cue] *duh duh duuuuuuh*

I found that Sphinx already has an extension for PlantUML. I almost started reading the code to see how it worked.

Then I realized how dumb that was. It already works. Why read the code? Why not install it?

I had a choice to make.

  1. Continue building my own docutils plug-in.
  2. Switch to Sphinx.

Some complications:

  • My Literate Programming tool produces RST that *may* not be compatible with Sphinx.
  • It's yet another dependency in a tool that started out with zero dependencies. I've added pytest and tox. What next? 

What to do?

I have to say that Git is amazing. I can make a branch for the spike. If it works, pull request. If it doesn't work, delete the branch. This continues to be game-changing to me. I'm old. I remember when we had to back up the whole project directory tree before making this kind of change.

It worked. My tool's RST (with one exception) worked perfectly with Sphinx. The one exception was an obscure directive, .. class:: name, used to provide an HTML class name for the following block. This always should have been the docutils .. container:: name directive. With this fix, we're good to go.

I'm happy I avoided the trap of reimplementing something. Instead of that, I upgraded from "bare" docutils with my own CSS to Sphinx with it's sophisticated templates and HTML Themes.

Tuesday, August 2, 2022

Books! Books! Books!

First, there's 

Pivot to Python

A Guide for professionals and skilled beginners


I've recently updated this to fix some cosmetic problems with title pages, the table of contents and stuff like that. The content hasn't changed. Yet. It's still an introduction to Python for folks who already know how to program, they want to pivot to programming in Python. Quickly.

But wait, there's more. 

Unlearning SQL

When your only tool is a hammer, every problem looks like a nail


This is all new. It's written for folks who know Python, and are struggling with the architectural balance between writing bulk processing in SQL or writing it in Python. For too many developers, SQL is effectively the only tool they can use. With a variety of tools, it becomes easier to solve a wider variety of problems.

Tuesday, July 26, 2022

Bashing the Bash -- The shell is awful and what you can do about it

A presentation I did recently.


Folks were polite and didn't have too many questions. I guess they fundamentally agreed: the shell is awful, we can use it for a few things.

Safe Shell Scripts Stay Simple: Set the environment, Start the application.

The Seven S's of shell scripting.

Many many thanks to Code & Supply for hosting me.

Tuesday, July 19, 2022

I've got a great Proof-of-Concept. How do I go forward with it?

This is the best part about Python -- you can build something quickly. And it really works.


What are the next steps?

While there are a *lot* of possibilities, I'm focused on an "enterprise work group" application that involves a clever web service/RESTful API built in Flask. Maybe with NLP.

Let me catalog a bunch of things you might want to think about to "productionize" your great idea. Here's a short list to get started.

  • File System Organization
  • Virtual Environments
  • Unit Testing
  • Integration Testing
  • Acceptance Testing
  • Static Analysis
  • Tool Chain
  • Documentation
Let's dive into each one of these. Then we'll look at Flask deployments.

File System Organization

When you're gotten something to work, the directory in which it works is sometimes not organized ideally. There are a lot of ways to do this, but what seems to work well is a structure like the following.

- Some parent directory. Often in Git
  - src -- your code is here
  - tests -- your tests are here
  - docs -- your documentation will be here
  requirements.txt -- the list of packages to install. Exact, pinned version numbers
  requirements-dev.txt -- the list of packages used for maintenance and development
  environment.yml -- another list of packages in conda format
  pyproject.toml -- this has your tox setup in it
  Makefile -- sometimes helpful

Note that a lot of packages you see have a setup.py.  This is **only** needed if you're going open source your code. For enterprise projects, this is not the first thing you will focus on. Ignore it, for now.

Virtual Environments

When you're developing in Python you may not even worry about virtual environments. You have Python. It works. You downloaded NLP and Flask. You put things together and they work.

The trick here is the Python ecosystem is vast, and you have (without really observing it closely) likely downloaded a lot of projects. Projects that depend on projects. 

You can't trust your current environment to be reliable or repeatable. You'll need to use a virtual environment manager of some kind.

Python's built-in virtual environment manager venv is readily available and works nicely. See https://docs.python.org/3/library/venv.html  It's my second choice. 

My first choice is conda. Start with minicondahttps://docs.conda.io/en/latest/miniconda.html. Use this to assemble your environment and retest your application to be sure you've got everything.

You'll be creating (and destroying) virtual environments until you get it right. They're cheap. They don't impact your code in any way. Feel free to make mistakes.

When it works, build conda's environment.yml file and the requirements.txt files. This will rebuild the environment.  You'll use them with tox for testing.

If you don't use conda, you'll omit the environment.yml.  Nothing else will change.

Unit Testing

Of course, you'll need automated unit tests. You'll want 100% code coverage. You *really* want 100% logic path coverage, but that's aspirational. 100% code coverage is a lot of work and uncovers enough problems that the extra testing for all logic paths seems unhelpful.

You have two built-in unit testing toolsets: doctest and unittest. I like doctest. https://docs.python.org/3/library/doctest.html

You'll want to get pytest and the pytest-cov add-on package. https://docs.pytest.org/en/6.2.x/contents.html  https://pytest-cov.readthedocs.io/en/latest/.  

Your test modules go in the tests directory. You know you've done it right when you can use the pytest command at the command line and it finds (and runs) all your tests. 

This is part of your requirements-dev.txt file.

Integration Testing

This is unit testing without so many mocks. I recommend using pytest for this, also. The difference is that your "fixtures" will be much more complex. Files. Databases. Flask Clients. Certificates. Maybe starting multiple services. All kinds of things that have a complex setup and perhaps a complex teardown, also.

See https://docs.pytest.org/en/6.2.x/fixture.html#yield-fixtures-recommended for good ways to handle this more complex setup and teardown.

Acceptance Testing

Depending on the community of users, it may be necessary to provide automated acceptance tests. For this, I recommend behave. https://behave.readthedocs.io/en/stable/ You're can write the test cases in the Gherkin language. This language is open-ended, and many stakeholders can contribute to the test cases. It's not easy to get consensus sometimes, and a more formal Gherkin test case lets people debate, come to an agreement, and prioritize the features and scenarios they need to see.

This is part of your requirements-dev.txt file.

Static Analysis

This is an extra layer of checking to be sure best practices are being followed. There are a variety of tools for this. You *always* want to process your code through blackhttps://black.readthedocs.io/en/stable/ 

Some folks love isort for putting the imports into a canonical order.  https://pycqa.github.io/isort/

Flake8 should be used to be sure there's no obviously bad programming practices. https://flake8.pycqa.org/en/latest/

I'm a huge fan of type hints. I consider mypy to be essential. https://mypy.readthedocs.io/en/stable/  I prefer "--strict" mode, but that can be a high bar. 

Tool Chain

You can try to manage this with make. But don't.

Download tox, instead.  https://tox.wiki/en/latest/index.html  

The point of tox is to combine virtual environment setup with testing in that virtual environment. You can -- without too much pain -- define multiple virtual environments. You can then test the various releases of the various packages your project depends on in various combinations. This is how to manage a clean upgrade. 

1. Figure out the new versions.

2. Setup tox to test existing and new.

3. Run tox.

I often set the tox commands to run black first, then unit testing, then static analysis, ending with mypy --strict.

When the code is reformatted by black, it's technically a build failure. (You should have run black manually before running tox.) When tox works cleanly, you're ready to commit and push and pull request and merge.


Not an after-thought.

For human documents, use Sphinx. https://www.sphinx-doc.org/en/master/ 

Put docstrings in every package, every module, every class, every method, and every function. Summarize *what* and *why*. (Don't explain *how*: people can read your code.) 

Use the autodoc feature to create the API reference documentation from the code. Start with this.

Later, you can write a README, and some explanations, and installation instructions, and all the things other people expect to see.

For a RESTful API, be sure to write an OpenAPI specification and be sure to test against that spec. https://www.openapis.org. While a lot of the examples are complicated, you can easily use a small subset to describe your documents, the validation rules, and the transactions. You can add the security details later. They're part of your web server, but they don't need an extensive OpenAPI documentation at the beginning.

Flask Deployments

Some folks like to define a flask application that can be installed in the Python virtual environment. This means the components are on the default sys.path without any "extra" effort. (It's a fair amount of effort to begin with. I'm not sure it's worth it.)

When you run a flask app, you'll be using some kind of engine. NGINX, uWSGI, GUnicorn, etc. (GUnicorn is very nice. https://gunicorn.org). 

See https://flask.palletsprojects.com/en/2.0.x/deploying/wsgi-standalone/.

In all cases, these engines will "wrap" your Flask application. You'll want to make your application visible by setting the PYTHONPATH environment variable, naming your src directory. Do not run from your project's directory.

You will have the engine running in some distinct /opt/the_app or /Users/the_app or /usr/home/the_app or some such directory, unrelated to where the code lives. You'll use GUnicorns command-line options to locate your app, wherever it lives on the filesystem. GUnicorn will use PYTHONPATH to find your app. Since web servers often run as nobody, you'll need to make sure your code base is readable. But. Not. Writable.

Tuesday, July 12, 2022

The Enterprise COBOL Conundrum

Enterprise COBOL is both a liability and an asset. There's tangible value hidden in the code.

See https://github.com/slott56/looking-at-cobol  

I've tweaked the presentation a little. 

The essential ingredients in coping with COBOL are these:

  • Use something like Stingray Reader to parse COBOL DDE's and process the data in the native format.
  • Analyze the Job Control Language (JCL) to work out the directed acyclic graph (DAG) that leads to file and database updates. These "master" files and databases are the data artifacts that matter most. This is the value-creating processing. There aren't many of these files. 
  • Create a process to clone those files, and write Python data access modules to process the data. This is a two-way process. You'll be shipping files from your Z/OS world to another server running Python. In some cases, files will need to come back to Z/OS to permit legacy processing to continue. 
  • Work backwards through the DAG to understand the COBOL apps that update the master files. These can be rewritten as Python apps that consume transactions and update master files/databases. Transfer transaction files out of Z/OS to a server doing the Python processing. Either update a shared database or send updated master files back to Z/OS if there's further processing that needs an updated master. 
  • Continue working backwards through the DAG, replacing COBOL with Python until you've found source files for the transactions. Expect to find transaction validation programs as well as transaction analytics or reporting. The validations are useful; the analytics and reporting can be replaced with simpler, more modern tools.
  • When there's no more legacy processing that depends on a given master file or database, then the Z/OS can be formally decommissioned. Have a party.

This is relatively low risk work. It's high value. The COBOL code encodes enterprise knowledge. Preserving this knowledge in a more modern language is a value-maintaining exercise. Indeed, the improved clarity may be a value-creating exercise.