Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Tuesday, July 24, 2018

Mastering Object-Oriented Python -- 2nd Edition

It's time to revise Mastering Object-Oriented Python. While the previous edition is solidly focused on Python3, it lacks some important features:
  • F-Strings
  • Type Hints
  • types.NamedTuple
  • Data Classes
So. There's some stuff to add. I don't think there's too much to take away. I plan to make some things a little more tidy. I will remove all references to Python2 and all references to how things used to be and why they're better now.

It will be several months before this is available. Stand by for updates.

The earliest drafts of this book date back to 2002. Seriously. I've been over this material a lot in the past 1.5 decades.

The nascent form of this book took me years (maybe 10 years) to accumulate. It covered everything: data structures, statements, built-in functions, classes, and a bunch of libraries. It was beyond merely ambitious and off into some void of "cover all the things." 

I was motivated by my undergrad CS text books on the foundations of computer science. The idea of putting the language features into a parallel structure with boolean algebra, set theory, and number theory was too cool for words. And -- lacking the necessary formal background -- it was something I'm not able to present very well.

While I wanted to cover all of Computer Science, acquisition editors were pointed out how crazy that idea was. A focus on the object-oriented features of Python was sufficient to sell a distinctive book. And they were absolutely right.

As I rework the outline for the 2nd edition, there are some other topics that crop up. These are not going to wind up in the book, but they're an implicit feature of the topics being covered.

CS Foundations and Python

One of the best of the introductory books (which came out after I graduated) was Structured Concurrent Programming With Operating Systems Applications. They presented a nested collection of sub-languages: SP/k. The organization of the nested subsets can be helpful for exposing programming incrementally. There are issues, and we'll look at them in detail below. Here's the collection of subsets from the original book (and related articles.)

  • SP/1 expressions and output. The print() function.
  • SP/2 variables, assignment, and the input() function.
  • SP/3 selection and repetition. The Python if and while constructs are the logical minimum, but the for statement makes more sense because it's so widely used.
  • SP/4 character strings. 
  • SP/5 arrays. Python lists, really.
  • SP/6 procedures. Python function definition.
  • SP/7 formatted input-output. f-strings for output, and regular expressions for parsing.
  • SP/8 records and files.
There are a lot of gaps between this list of subsets and modern programming languages. SP/k was explicitly based on subset of PL/I, saving the complexity of implementing special compilers. It also reflects the mid-70's state of the art.

What didn't age well is the implicit understanding that numbers are the only built-in data types. Strings are so magical they're isolated into two separate subsets: SP/4 and SP/7. Arrays are called out, but sets and dictionaries didn't exist in PL/I and aren't part of this nested sequence.

Also. And even more fundamental.

There's a bias toward "procedural" programming. The SP/k subsets expose the statements of the language. There are few data structures, and it seems the data structures require some statements before they're useful.

This leads to my restructuring of this. It doesn't apply to the Mastering OO Python book. It's something I use for Python bootcamp training.

  • py/1 expressions and output: int, float, numeric built-in functions, and the print() function.
  • py/2 variables, assignment, and the input() function.
  • py/3 strings, formatting, and various built-in string parsing methods.
  • py/4 tuples and multiple assignment. (Since tuples are immutable, they're more like strings than they are like lists.) And yes, this is kind of short.
  • py/5 if statements and try/except statements. These are the two fundamental "selection" statements. The raise statement is deferred until the functions section.
  • py/6 sets and the for statement.
  • py/7 lists.
  • py/8 dictionaries.
  • py/9 functions (avoiding higher-order functions, decorators, and generator functions.)
  • py/10 contexts, with, and file I/O.
  • py/11 classes and objects.
  • py/12 modules and packages.
The point here is to expose the data structures as the central theme of Python. Statements follow as needed to work with the data structures. 

Note that some topics -- like break, continue, and while -- are advanced parts of working with data structures.

The standard library? Not included. Perhaps should be. But. It's technically separate from the language and all of this can be done without any imports. We would then cover a bunch of standard library modules. The order includes math, random, re, collections, typing, and pathlib

Tuesday, July 17, 2018

Patient Crawling and Possible Phishing

Once every few months I get an email like this. What is it? Phishing?

I've finally looked into it, and learned two important lessons.

Here's the body of the email.
Hello there,
Your page http://www.itmaybeahack.com/homepage/iblog/C364310209/E20080407095503.html has some good references to cyber security so I wanted to get in touch with you. I've recently written an article The 6 Types Of Cyber Attacks To Protect Against In 2018 and was wondering if you thought my article could be a good addition to your page.
You can read my article right here: https://pagely.com/blog/cyber-attacks-in-2018/
I would like to hear your opinion on this article. Also, if you find it useful, please consider linking to it from your page I mentioned earlier. If you prefer you may republish the article. Let me know what you think.
Thank you very much,
Really?

The page they cited has three (3) external links. One is to actual cyber security content. Another now gets redirected to generic advertising, and the third (like the original blog post) is a decade old.

What does this mean?

Clearly, it means some bot found my page. One of the links was to something they're trying to SEO boost. (How do I know it's SEO? I don't. The email address is similar to an SEO boosting company, so it seems like that's what's going on here.)

I've been haphazard about responding to these because I'm a fundamentally charitable person.

Or I'm a total pushover to certain kinds of social engineering. You choose.

You see the appeal to my vanity in the email? They read my ancient content! Swoon!

The email looks personal. There's a name. Spelled consistently. With no digits in it. Someone read my content and reached out to me! I'm in love! Ah! Sweet Mystery of Life at last, I've found you!

The email makes me think -- somehow -- it's not a bot and there's a person involved. A person trying to make a buck selling content and advertising. I should help them, right? Amplify their signal and all?

What a chump I am! I should simply ignore these.

In the past, I have responded with a "Nope. That content is too old to do anything with. I should delete it but I'm too lazy." Once a bot found a link on live content, and I dutifully updated it. I now know any response is a mistake.

I checked out the page.ly site. It's a nice summary of cyber attacks. It seems to be a not-to-dangerous link to not-bad content. Except for the Unicode errors throughout the document. Like someone copied and pasted the original bytes -- intended for CP-1252 -- to a site explicitly using UTF-8.

That's not all.

The name on the email, and the author of the article don't match.  The email says "my article" but the article has a different author.

Red Flag.

After (finally) spending five minutes on this, I learned two things.

  • First: this is nonsense. It's some kind of phishing attack. Or some kind of SEO-boosting bot that doesn't check dates very well.
  • Second: I'm an easy mark when people appeal to my vanity. I need to stop responding, no matter how effusive the (inferred) praise I think I'm hearing.

Tuesday, July 10, 2018

10 common security gotchas in Python and how to avoid them

First, read this: 10 common security gotchas in Python and how to avoid them by Anthony Shaw

Of these, most are important, but not specific to Python at all. Only items 3, 4, 7, and 8 are pretty specific to Python. They talk about the assert statement, some timing vulnerabilities, and the bad idea of transmitting pickle files.

Item 5 is also specific to Python, but I quibble about it's relevance. It is at the very edge of "security." The PYTHONPATH environment variable is most definitely not "...one of the biggest security holes in Python." If the path is a security hole, then any code is a security hole. If we view code as a security hole, then the only truly secure system has no software.

(As someone who lived on a sailboat. I happen to subscribe the position that the only truly secure system has no software. Use line, shackles, and well-known knots if you want to stake your life on it. Use fancy electronics with software to make it simple and fun.)

Bad programming is the biggest security hole. Failure to prevent SQL injection. Failure to use CSRF tokens. Failure to properly handle credentials. These are security holes of epic proportions.

The PYTHONPATH cannot be changed through any kind of request handling. Even colossally dumb software that blindly uploads XML or JPEG files without vetting them won't change the PYTHONPATH.  You'd have to write code that changed sys.path. Or you'd have to write code that reset the os.environ and then started applications in the new environment. This is seriously bad code, and has nothing to do with Python.

Otherwise, the only way to change PYTHONPATH requires an Evil Super Genius who has your compromised credentials. Once your credentials are compromised anything is possible, including the setting the PATH environment variable, or deleting all the accounts, or rm -rf /. None of which is specific to Python.

Item 9 -- patching the system Python -- may be important, All OS's should have patches applied early and often. However. We strongly discourage our developers from using the system Python for anything. We always build environments. We always install our own Python 3 with our own packages. We generally ignore the system Python to the extent possible.

Item 7, though, is a huge deal. We use OAS (formerly known as swagger.) The old swagger.json end-point was -- clearly -- json. The new OAS 3, however, suggests the specifications be provided at  openapi.yaml. This week we're rolling out a cluster of microservices using our shiny new OAS 3 specifications. And we're using default yaml.load() instead of yaml.safe_load() as part of the contract hand-shake among the services. All internally-facing handshakes, but still unsafe with respect to a man-in-the-middle hacking our specifications.

While I can quibble about two of the ten items, the other eight are rock solid, and should be part of periodic in-house code reviews.

And number 7 is killer.