I was amazed.
Step 1. I turned on the cProfile. I added two methods to the slowest unit test module.
def profile(): import cProfile cProfile.run( 'main()', 'the_slow_module.prof' ) report() def report(): import pstats p = pstats.Stats( 'the_slow_module.prof' ) p.sort_stats('time').print_callees(24)
Now I can add profiling or simply review the report. Looking at the "callees" provided some hints as to why a particular method was so slow.
Step 2. I replaced ElementTree with cElementTree (duh.) Everyone should know this. I didn't realize how much this mattered. The trick is to note how much time was spent doing XML parsing. In the case of this unit test suite, it was a LOT of time. In the case of the overall application that uses this library, that won't be true.
Step 3. The slowest method was assembling a list. It did a lot of list.append(), and list.__len__(). It looked approximately like the following.
def something( self ): result=  for index, value in some_source: while len(result)+1 != index: result.append( None ) result.append( SomeClass( value ) ) return result
This is easily replaced by a generator. The API changes, so every use of this method function may need to be modified to use the generator instead of the list object.
def something_iter( self ): counter= 0 for index, value in some_source: while counter+1 != index: yield None counter += 1 yield SomeClass( value ) counter += 1
The generator was significantly faster than list assembly.
Two minor code changes and a significant speed-up.