Thursday, January 23, 2014

Manual Testing -- Bad Idea

The question of testing came up recently. The description of the process sounded like manually "testing" some complex web application.

When trying to work out manual "testing", I find it necessary to use scare quotes. I'm not sure there's a place for "manual testing" of any software.

I know that some folks use Selenium to created automated test scripts for interactive applications. That may be a helpful technology. I prefer automated test scripts over manual testing. Consequently, I'm not too interested in helping out with testing -- other than perhaps coaching developers to write automated test scripts.


To continue this rant.

I've seen the suggestion that having a person do some manual "testing" will permit them to notice things that are "broken" but not a formal requirement in a test script. This seems to require some willing misuse of words. A person who's supposed to be noticing stuff isn't testing: they're exploring or demonstrating or thinking. They're not testing. Tests are -- by definition -- pass-fail. This is a very narrow definition: if there's no failure mode, it's not a test, it's something else. Lots of words are available, use any word except "test." Please.

Reading about exploratory "testing" leads to profound questions about the nature of failing an exploratory "test." When did the failure mode become a requirement? During the exploration? Not prior to the actual development? 

When an explorer finds a use case that was never previously part of a user story, then it's really an update to a user story. It's a new requirement; a requirement defined by a test which fails. It's a really high-quality requirement. More importantly, exploratory "testing" is clearly design. It's product ownership. 

This kind of exploration/thinking/playing/experiencing is valuable stuff. It needs to be done. But it's not testing.

Developers create the test scripts: unit tests of various kinds. Back-end tests. Front-end tests. Lots of testing. All automated. All.

Other experienced people -- e.g., a product owner -- can also play with the released software and create informed, insightful user stories and user story modifications that may lead to revisions to test cases. They're not testing. They're exploring. They're writing new requirements, updating user stories, and putting work into the backlog.

Putting work into the backlog

An exploratory "test" should not be allowed to gum up a release. To do that breaks the essential work cycle of picking a story with fixed boundaries and getting it to work.  Or picking a story with nebulous boundaries and grooming it to have fixed boundaries. Once you think you're doing exploratory "testing" on a release that's in progress, then the user stories no longer have a fixed boundary, and the idea of a fixed release cycle is damaged. It becomes impossible to make predictions, since the stories are no longer fixed.

For a startup development effort, the automated test scripts will grow in complexity very quickly. In many cases, the test scripts will grow considerably faster than the product code. This is good.

It's perfectly normal for a product owner to find behaviors that aren't being tested properly by the initial set of automated test scripts. This is good, too. As the product matures, the test scripts expand. The product owner should have increasing difficulty locating features which are untested.

Management Support

What I've found is that some developers object to writing test scripts. One possible reason is because the test scripts don't seem to be as much "fun" as playing with GUI development tools. 

I think the more important reason is that developers in larger organizations are not rewarded for software which is complete, but are rewarded for new features no matter what level of quality they achieve. This seems to happens when software development is mismanaged using a faulty schedules and a faulty idea of the rate of delivery of working software.

If the schedule -- not working features -- dominates management thinking, then time spent writing tests to show precisely how well a feature works is treated as waste. Managers will ask if a developer is just "gold plating" or "polishing a BB" or some other way of discrediting automated test case development.

If the features dominate the discussion, then test development should be the management focus. A new feature without a sufficiently robust suite of automated tests is just a technology spike, not something which can be released.

Manual "testing" and exploratory "testing" seem to allow managers to claim that they're testing without actually automating the tests. It appears that some managers feel that reproducible, automated test take longer and cost more than having someone play with the release to see if it appears to work.

But What About...

The most common complaint about automated GUI testing isn't a proper pass-fail test issue at all.

Folks will insist that somehow font choice, color, position or other net effects of CSS properties must be "tested." Generally, they seem to be conflating two related (but different) things.

1. Design. This is the position/color/font issue. These are design features of a GUI page or JavaScript window or HTML document. Design. The design may need to be reviewed by a person. But no testing makes sense here. The design isn't a "pass-fail" item. Someone may say it's ugly, but that's a far cry from not working. CSS design (especially for people like me who don't really understand it) sure feels like hacking out code. That doesn't mean the design gets tested.

2. Implementation. This is the "does every element use the correct CSS class or id?" question. This is automated testing. And it has nothing to do with looking at a page. It has everything to do with an automated test to be sure HTML tags are generated properly. It has nothing to do with the choice of packing algorithm in a widget, and everything to do with elements simply making the correct API calls to assure that they're properly packed.

For people like me who don't fully get CSS, lots of pages need to be reviewed to make sure the CSS "worked". But that's a design review. It's not a part of automated testing.

Here's the rule: Ugly and Not Working have nothing to do with each other. You can automate tests for "works" -- that's objective. You can't automated the test for "ugly" -- that's subjective.

Here's how some developers get confused.  A bug report that amounts to "ugly" is fixed by making a  change to a GUI element. This is a valid kind of bug-to-change. But how can the change have an automated test? You must have a person confirm that the GUI is no longer ugly. Right?

Wrong.

The confusion stems from conflating design (change to reduce the ugliness) and implementation (some API change to the offending element.) The design change isn't subject to automated testing. Indeed, it passed the unit tests before and after because it worked.

No design can have automated testing. We don't test algorithm design, either. We test algorithm implementations.

Compare it with class design vs. implementation. We don't check every possible aspect of a class to be sure it follows a design. We check some external-facing features. We don't retest the entire library, compiler, OS and toolset, do we? We presume that design is implemented more-or-less properly, and seek to confirm that the edges and corners work.

Compare it with database design vs. implementation. We don't check every bit on the database. We check that -- across the API -- the application uses the database properly.

There's no reason to test every pixel of an implementation if the design was reviewed and approved and the GUI elements use the design properly.

Tuesday, January 21, 2014

Not in HamCalc -- But perhaps should be

This is the kind of little program that would be in HamCalc. But doesn't appear to be.

Looking at the Airfoil web page, specifically, this one: http://airfoiltools.com/airfoil/details?airfoil=ls013-il.

The measurements are all given in fractions of the depth of the airfoil. So you have to scale them. I was working with what may turn out to be a 48" rudder for a boat based on this design. I'm waiting for some details from the engineer who really knows this stuff.

How do we turn these fractions into measurements for folks that work in feet and inches?

We can use a spreadsheet -- and I suspect many folks would be successful spreadsheeting this data. For some reason, that's not my first choice. I worry about accidental copy and paste errors or some other fat-finger blunder in a spreadsheet. With code, it's easy to reproduce the results from the source as needed.

Here's the raw data.

http://airfoiltools.com/airfoil/seligdatfile?airfoil=ls013-il

Part 1. Fractions.

from fractions import Fraction

class Improper(Fraction):
    def __str__( self ):
        whole= int(self)
        fract= self-whole
        if fract == 0: return '{0}'.format(whole)
        if whole == 0: return '{0}'.format(fract)
        return '{0} {1}'.format(whole,fract)

The idea is to be able to produce improper fractions like 47 ½" or 24" or ¾".  My Macintosh magically rewrites fractions into a better-looking Unicode sequence. I didn't include that feature in the above version of Improper. Mostly because in Courier, the generic fractions look kind of icky.

The raw data is readable as a kind of CSV file.

import csv

def get_data( source ):
    rdr= csv.reader( source, delimiter=' ', skipinitialspace=True )
    heading= next(rdr)
    print( heading )
    for row in rdr:
        yield float(row[0]), float(row[1])

That saves fooling around with parsing -- we get the profile numbers from the raw data as a pair of floats. 

Finally, the report.

def report( seligdatfile, depth, unit ):
    scale=16 #th of an inch
    for d, t in get_data( source ):
        d_in, t_in = d*depth, t*depth
        d_scale = Improper( int(d_in*scale), scale )
        t_scale = Improper( int(t_in*scale), scale )
        print( '{0:6.2f} {unit} {1:6.2f} {unit} | {2:>8s} {unit} {3:>8s} {unit}'.format(
            d_in, t_in, d_scale, t_scale, unit=unit) )

This gives us a pleasant-enough table of the measurements in decimal places and fractions.

We can use this for any of the variant airfoils available.  Here's the top-level script.

import urllib.request
with urllib.request.urlopen( "http://airfoiltools.com/airfoil/seligdatfile?airfoil=ls013-il" ) as source:
    seligdatfile= source.read().decode("ASCII")

import io
with io.StringIO( seligdatfile ) as source:
    report( source, depth=48, unit="in." )

I'm guessing the data files are ASCII encoded, not UTF-8. It doesn't appear to matter too much, and it's an easy change to make if they track down an airfoil data file what has a character that's not ASCII, but UTF-8.

Tuesday, January 14, 2014

Explaining an Application

Some years ago--never mind how long precisely--having little or no money in my purse...  I had a great chance to do some Test-Driven Reverse Engineering on a rather complex C program. I extracted test cases. I worked with the users to gather test cases. And I rewrote their legacy app using Test-Driven Development. The legacy C code was more a hint than anything else.

I thought it went well. We uncovered some issues in the test cases. Uncovered a known issue in the legacy program. And added new features. All very nice. A solid success.

Years later, a developer from the organization had to make some more changes.

The client calls.

"No problem," I assure them, "I'm happy to answer any questions. With one provision. Questions have to be about specific code. I can't do 'overview' questions. Email me the code snippet and the question."

I never heard another word.  No question of any kind. Not a general question (that I find difficult to answer,) nor a specific question.

Why the provision?

I find it very hard to talk with someone who hasn't actually read the code yet. I have done far too many presentations to people who are sitting around a conference room table, nodding and looking at power-point slides.

I know the initial phone call focused on "an overview." But what counts as an overview? Use cases? Data model? Architectural layers? Test cases? Rather than waste time explaining something irrelevant, I figured if they asked anything -- anything at all -- I could focus on what they really wanted to know.

I know that I have never been able to understand people hand-waving at a picture of code. I have to actually read the code to see what the modules, classes and functions are and how they seem to work. I'm suspicious of graphics and diagrams.  I know that I can't read the code while someone is talking. If they insist on talking, I need to read the code in advance.

Perhaps I'm imposing too much on this customer. But. They're going to maintain the code -- that seems to mean they need to understand it. And they need to understand it their own way, without my babbling randomly about the bits that interested me. Maybe the part I found confusing is obvious to them, and doesn't bear repeating.

Perhaps raising the bar to "specific questions about specific code" forced them to read enough. Perhaps after some reading, they realized they didn't need to pay me to explain things. I certainly can't brag that the code explained itself.

Or. Perhaps they realized how the unit tests worked and realized that TestCases provide a roadmap of the API.

Thursday, January 9, 2014

Computers, Power and Space Heaters

Just a safety note for folks who use a lot of electricity. In the winter, you might have computers and space heaters. Or you might (like me) live "off the grid". We use an inverter with a transfer relay to switch between battery power and grid power.

Recently, we had a bad smell. Here's the root cause analysis of the smell.

Do Not Overload AC Wiring. 

Figure out how many amps your computers, monitors and winter-time space heater actually use. The numbers are displayed in various places around plugs and plates on the backs of things. A few devices will list power in watts -- divide by 120 to get amps. A 1500W heater is 12.5A. 

Next figure out what your circuit breakers can handle. Many household circuits are 15A. A power strip may not even handle 10A. Two space heaters in one room is likely to exceed the wiring's current carrying capacity.

The root cause of the melted connector block might be a design flaw -- the block may have been too small for the rated load of 30A. While the device overall seems good, this kind of shabby engineering is alarming.

Tuesday, January 7, 2014

Wrestling with the Python shelve module

While wrestling with Python's shelve module, I ran across ACID. Interesting thoughts.

Plus what appears to be the related Tumblr blog: python sweetness. Also interesting.

Not sure I can make heavy use of it right now, but it's helpful to see the thought process.

I find the subject of shelve (or pickle) and schema change endlessly fascinating.  I have no good ideas to contribute, but it helps to read about ways to track schema evolution against data that's as highly class-specific as shelve data is.

Versioning class definitions and doing data migration to upgrade a database is -- right now -- a fascinating problem.