Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Tuesday, August 28, 2018

Cool success story of Cython

Real Python (@realpython)
A multi-core Python HTTP server (much) faster than Go (spoiler: Cython)

nexedi.com/NXD-Blog.Multi…

https://www.nexedi.com/NXD-Blog.Multicore.Python.HTTP.Server

This is handy. It makes perfect sense that Python -- with a little help -- can be compiled down to super-fast code. Hopefully, the Cython world will continue to evolve toward using native Python type hints.

When Cython uses fully-native type hints, it becomes a super-convenient and transparent performance booster.

Without full-native type hints it becomes a place where bugs are injected as part of trying to improve performance.

Tuesday, August 21, 2018

Python Dependency Management

Freezing Python’s Dependency Hell in 2018

Excellent advice.

Excepot for the "Don't use Anaconda." Yes. It's a big download. Odds are good you'll need most of it. So. Just do it now.

The (miniconda + environment.yml) as an entry point is really good. The "rely on people to actually know and consistently use their best practices" doesn't seem like a problem, it seems like a consequence of an evolving software ecosystem.

Tuesday, August 14, 2018

Why is Python so slow?

This is brilliant. 

Why is Python so slow? by Anthony Shaw

It covers three aspects of the implementation in a respectable level of detail. Helpful information. Bookmark it to help stop pointless bickering with people who don't understand the value of getting something to run right now vs. getting something that will eventually run and be fast.

Tuesday, July 24, 2018

Mastering Object-Oriented Python -- 2nd Edition

It's time to revise Mastering Object-Oriented Python. While the previous edition is solidly focused on Python3, it lacks some important features:
  • F-Strings
  • Type Hints
  • types.NamedTuple
  • Data Classes
So. There's some stuff to add. I don't think there's too much to take away. I plan to make some things a little more tidy. I will remove all references to Python2 and all references to how things used to be and why they're better now.

It will be several months before this is available. Stand by for updates.

The earliest drafts of this book date back to 2002. Seriously. I've been over this material a lot in the past 1.5 decades.

The nascent form of this book took me years (maybe 10 years) to accumulate. It covered everything: data structures, statements, built-in functions, classes, and a bunch of libraries. It was beyond merely ambitious and off into some void of "cover all the things." 

I was motivated by my undergrad CS text books on the foundations of computer science. The idea of putting the language features into a parallel structure with boolean algebra, set theory, and number theory was too cool for words. And -- lacking the necessary formal background -- it was something I'm not able to present very well.

While I wanted to cover all of Computer Science, acquisition editors were pointed out how crazy that idea was. A focus on the object-oriented features of Python was sufficient to sell a distinctive book. And they were absolutely right.

As I rework the outline for the 2nd edition, there are some other topics that crop up. These are not going to wind up in the book, but they're an implicit feature of the topics being covered.

CS Foundations and Python

One of the best of the introductory books (which came out after I graduated) was Structured Concurrent Programming With Operating Systems Applications. They presented a nested collection of sub-languages: SP/k. The organization of the nested subsets can be helpful for exposing programming incrementally. There are issues, and we'll look at them in detail below. Here's the collection of subsets from the original book (and related articles.)

  • SP/1 expressions and output. The print() function.
  • SP/2 variables, assignment, and the input() function.
  • SP/3 selection and repetition. The Python if and while constructs are the logical minimum, but the for statement makes more sense because it's so widely used.
  • SP/4 character strings. 
  • SP/5 arrays. Python lists, really.
  • SP/6 procedures. Python function definition.
  • SP/7 formatted input-output. f-strings for output, and regular expressions for parsing.
  • SP/8 records and files.
There are a lot of gaps between this list of subsets and modern programming languages. SP/k was explicitly based on subset of PL/I, saving the complexity of implementing special compilers. It also reflects the mid-70's state of the art.

What didn't age well is the implicit understanding that numbers are the only built-in data types. Strings are so magical they're isolated into two separate subsets: SP/4 and SP/7. Arrays are called out, but sets and dictionaries didn't exist in PL/I and aren't part of this nested sequence.

Also. And even more fundamental.

There's a bias toward "procedural" programming. The SP/k subsets expose the statements of the language. There are few data structures, and it seems the data structures require some statements before they're useful.

This leads to my restructuring of this. It doesn't apply to the Mastering OO Python book. It's something I use for Python bootcamp training.

  • py/1 expressions and output: int, float, numeric built-in functions, and the print() function.
  • py/2 variables, assignment, and the input() function.
  • py/3 strings, formatting, and various built-in string parsing methods.
  • py/4 tuples and multiple assignment. (Since tuples are immutable, they're more like strings than they are like lists.) And yes, this is kind of short.
  • py/5 if statements and try/except statements. These are the two fundamental "selection" statements. The raise statement is deferred until the functions section.
  • py/6 sets and the for statement.
  • py/7 lists.
  • py/8 dictionaries.
  • py/9 functions (avoiding higher-order functions, decorators, and generator functions.)
  • py/10 contexts, with, and file I/O.
  • py/11 classes and objects.
  • py/12 modules and packages.
The point here is to expose the data structures as the central theme of Python. Statements follow as needed to work with the data structures. 

Note that some topics -- like break, continue, and while -- are advanced parts of working with data structures.

The standard library? Not included. Perhaps should be. But. It's technically separate from the language and all of this can be done without any imports. We would then cover a bunch of standard library modules. The order includes math, random, re, collections, typing, and pathlib

Tuesday, July 17, 2018

Patient Crawling and Possible Phishing

Once every few months I get an email like this. What is it? Phishing?

I've finally looked into it, and learned two important lessons.

Here's the body of the email.
Hello there,
Your page http://www.itmaybeahack.com/homepage/iblog/C364310209/E20080407095503.html has some good references to cyber security so I wanted to get in touch with you. I've recently written an article The 6 Types Of Cyber Attacks To Protect Against In 2018 and was wondering if you thought my article could be a good addition to your page.
You can read my article right here: https://pagely.com/blog/cyber-attacks-in-2018/
I would like to hear your opinion on this article. Also, if you find it useful, please consider linking to it from your page I mentioned earlier. If you prefer you may republish the article. Let me know what you think.
Thank you very much,
Really?

The page they cited has three (3) external links. One is to actual cyber security content. Another now gets redirected to generic advertising, and the third (like the original blog post) is a decade old.

What does this mean?

Clearly, it means some bot found my page. One of the links was to something they're trying to SEO boost. (How do I know it's SEO? I don't. The email address is similar to an SEO boosting company, so it seems like that's what's going on here.)

I've been haphazard about responding to these because I'm a fundamentally charitable person.

Or I'm a total pushover to certain kinds of social engineering. You choose.

You see the appeal to my vanity in the email? They read my ancient content! Swoon!

The email looks personal. There's a name. Spelled consistently. With no digits in it. Someone read my content and reached out to me! I'm in love! Ah! Sweet Mystery of Life at last, I've found you!

The email makes me think -- somehow -- it's not a bot and there's a person involved. A person trying to make a buck selling content and advertising. I should help them, right? Amplify their signal and all?

What a chump I am! I should simply ignore these.

In the past, I have responded with a "Nope. That content is too old to do anything with. I should delete it but I'm too lazy." Once a bot found a link on live content, and I dutifully updated it. I now know any response is a mistake.

I checked out the page.ly site. It's a nice summary of cyber attacks. It seems to be a not-to-dangerous link to not-bad content. Except for the Unicode errors throughout the document. Like someone copied and pasted the original bytes -- intended for CP-1252 -- to a site explicitly using UTF-8.

That's not all.

The name on the email, and the author of the article don't match.  The email says "my article" but the article has a different author.

Red Flag.

After (finally) spending five minutes on this, I learned two things.

  • First: this is nonsense. It's some kind of phishing attack. Or some kind of SEO-boosting bot that doesn't check dates very well.
  • Second: I'm an easy mark when people appeal to my vanity. I need to stop responding, no matter how effusive the (inferred) praise I think I'm hearing.

Tuesday, July 10, 2018

10 common security gotchas in Python and how to avoid them

First, read this: 10 common security gotchas in Python and how to avoid them by Anthony Shaw

Of these, most are important, but not specific to Python at all. Only items 3, 4, 7, and 8 are pretty specific to Python. They talk about the assert statement, some timing vulnerabilities, and the bad idea of transmitting pickle files.

Item 5 is also specific to Python, but I quibble about it's relevance. It is at the very edge of "security." The PYTHONPATH environment variable is most definitely not "...one of the biggest security holes in Python." If the path is a security hole, then any code is a security hole. If we view code as a security hole, then the only truly secure system has no software.

(As someone who lived on a sailboat. I happen to subscribe the position that the only truly secure system has no software. Use line, shackles, and well-known knots if you want to stake your life on it. Use fancy electronics with software to make it simple and fun.)

Bad programming is the biggest security hole. Failure to prevent SQL injection. Failure to use CSRF tokens. Failure to properly handle credentials. These are security holes of epic proportions.

The PYTHONPATH cannot be changed through any kind of request handling. Even colossally dumb software that blindly uploads XML or JPEG files without vetting them won't change the PYTHONPATH.  You'd have to write code that changed sys.path. Or you'd have to write code that reset the os.environ and then started applications in the new environment. This is seriously bad code, and has nothing to do with Python.

Otherwise, the only way to change PYTHONPATH requires an Evil Super Genius who has your compromised credentials. Once your credentials are compromised anything is possible, including the setting the PATH environment variable, or deleting all the accounts, or rm -rf /. None of which is specific to Python.

Item 9 -- patching the system Python -- may be important, All OS's should have patches applied early and often. However. We strongly discourage our developers from using the system Python for anything. We always build environments. We always install our own Python 3 with our own packages. We generally ignore the system Python to the extent possible.

Item 7, though, is a huge deal. We use OAS (formerly known as swagger.) The old swagger.json end-point was -- clearly -- json. The new OAS 3, however, suggests the specifications be provided at  openapi.yaml. This week we're rolling out a cluster of microservices using our shiny new OAS 3 specifications. And we're using default yaml.load() instead of yaml.safe_load() as part of the contract hand-shake among the services. All internally-facing handshakes, but still unsafe with respect to a man-in-the-middle hacking our specifications.

While I can quibble about two of the ten items, the other eight are rock solid, and should be part of periodic in-house code reviews.

And number 7 is killer. 

Friday, June 22, 2018

Type Hinting Edge Case

Warning. I'm new to this. Yes, my book Functional Python Programming -- 2nd ed -- is full of type hints. But my examples are all (intentionally) relatively simple. There are edge cases that I do not pretend to understand.

Here's a fun one. Start here

This is a cool question.

Here's an essential clarification on what this structure is.


This is tricky and I think there are two reasons why it's hard.
1. We want to specify some details internal to instances of the np.array class.
2. We want to provide a size constraint, something that I don't think typing can do.

The size constraint may be handled by using Tuple, but it doesn't really fit in a general way. This three-tuple is Tuple[float, float, float]. You can see how that rapidly gets hideous for higher-dimension objects. You'd want Tuple[float*3], right?

The internal constraint, similarly, is challenging. However. An np.array() -- for the most part -- is a Sequence with extra features.

I have a suggestion.

1. A stubs/numpy.py file with this. I think this characterizes the array structure.

from typing import TypeVar, Sequence

_Base = TypeVar("_Base")

def array(*args: Sequence[_Base]) -> Sequence[_Base]: ...


2. Here's the target function.

import numpy as np
from typing import Sequence

Vector3 = Sequence[float]

def vec3(x: float, y: float, z: float) -> Vector3:
    return np.array((x, y, z))


This seems to capture part of the type definition. It doesn't capture the 3-ness of the vector.